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Foreword 


The  present  edition  of  the  book  differs  substantially  from  the  previous  one.  Over  the 
period  of  time  since  the  publication  of  the  previous  edition  the  author  has  accumu¬ 
lated  quite  a  lot  of  ideas  concerning  possible  improvements  to  some  chapters  of  the 
book.  In  addition,  some  new  opportunities  were  found  for  an  accessible  exposition 
of  new  topics  that  had  not  appeared  in  textbooks  before  but  which  are  of  certain 
interest  for  applications  and  reflect  current  trends  in  the  development  of  modern 
probability  theory.  All  this  led  to  the  need  for  one  more  revision  of  the  book.  As 
a  result,  many  methodological  changes  were  made  and  a  lot  of  new  material  was 
added,  which  makes  the  book  more  logically  coherent  and  complete.  We  will  list 
here  only  the  main  changes  in  the  order  of  their  appearance  in  the  text. 

•  Section  4.4  “Expectations  of  Sums  of  a  Random  Number  of  Random  Variables” 
was  significantly  revised.  New  sufficient  conditions  for  Wald’s  identity  were  added. 
An  example  is  given  showing  that,  when  summands  are  non-identically  distributed, 
Wald’s  identity  can  fail  to  hold  even  in  the  case  when  its  right-hand  side  is  well- 
defined.  Later  on,  Theorem  11.3.2  shows  that,  for  identically  distributed  summands, 
Wald’s  identity  is  always  valid  whenever  its  right-hand  side  is  well-defined. 

•  In  Sect.  6.1  a  criterion  of  uniform  integrability  of  random  variables  is  con¬ 
structed,  which  simplifies  the  use  of  this  notion.  For  example,  the  criterion  directly 
implies  uniform  integrability  of  weighted  sums  of  uniformly  integrable  random  vari¬ 
ables. 

•  Section  7.2,  which  is  devoted  to  inversion  formulas,  was  substantially  expanded 
and  now  includes  assertions  useful  for  proving  integro-local  theorems  in  Sect.  8.7. 

•  In  Chap.  8,  integro-local  limit  theorems  for  sums  of  identically  distributed  ran¬ 
dom  variables  were  added  (Sects.  8.7  and  8.8).  These  theorems,  being  substantially 
more  precise  assertions  than  the  integral  limit  theorems,  do  not  require  additional 
conditions  and  play  an  important  role  in  investigating  large  deviation  probabilities 
in  Chap.  9. 
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•  A  new  chapter  was  written  on  probabilities  of  large  deviations  of  sums  of  ran¬ 
dom  variables  (Chap.  9).  The  chapter  provides  a  systematic  and  rather  complete 
exposition  of  the  large  deviation  theory  both  in  the  case  where  the  Cramer  condition 
(rapid  decay  of  distributions  at  infinity)  is  satisfied  and  where  it  is  not.  Both  integral 
and  integro-local  theorems  are  obtained.  The  large  deviation  principle  is  established. 

•  Assertions  concerning  the  case  of  non-identically  distributed  random  variables 
were  added  in  Chap.  10  on  “Renewal  Processes”.  Among  them  are  renewal  theo¬ 
rems  as  well  as  the  law  of  large  numbers  and  the  central  limit  theorem  for  renewal 
processes.  A  new  section  was  written  to  present  the  theory  of  generalised  renewal 
processes. 

•  An  extension  of  the  Kolmogorov  strong  law  of  large  numbers  to  the  case 
of  non-identically  distributed  random  variables  having  the  first  moment  only  was 
added  to  Chap.  11.  A  new  subsection  on  the  “Strong  law  of  large  numbers  for  gen¬ 
eralised  renewal  processes”  was  written. 

•  Chapter  12  on  “Random  walks  and  factorisation  identities”  was  substantially 
revised.  A  number  of  new  sections  were  added:  on  finding  factorisation  components 
in  explicit  form,  on  the  asymptotic  properties  of  the  distribution  of  the  suprema  of 
cumulated  sums  and  generalised  renewal  processes,  and  on  the  distribution  of  the 
first  passage  time. 

•  In  Chap.  13,  devoted  to  Markov  chains,  a  section  on  “The  law  of  large  numbers 
and  central  limit  theorem  for  sums  of  random  variables  defined  on  a  Markov  chain” 
was  added. 

•  Three  new  appendices  (6,  7  and  8)  were  written.  They  present  important  aux¬ 
iliary  material  on  the  following  topics:  “The  basic  properties  of  regularly  varying 
functions  and  subexponential  distributions”,  “Proofs  of  theorems  on  convergence  to 
stable  laws”,  and  “Upper  and  lower  bounds  for  the  distributions  of  sums  and  maxima 
of  sums  of  independent  random  variables”. 

As  has  already  been  noted,  these  are  just  the  most  significant  changes;  there  are 
also  many  others.  A  lot  of  typos  and  other  inaccuracies  were  fixed.  The  process  of 
creating  new  typos  and  misprints  in  the  course  of  one’s  work  on  a  book  is  random 
and  can  be  well  described  mathematically  by  the  Poisson  process  (for  the  defini¬ 
tion  of  Poisson  processes,  see  Chaps  10  and  19).  An  important  characteristic  of  the 
quality  of  a  book  is  the  intensity  of  this  process.  Unfortunately,  I  am  afraid  that  in 
the  two  previous  editions  (1999  and  2003)  this  intensity  perhaps  exceeded  a  certain 
acceptable  level.  Not  renouncing  his  own  responsibility,  the  author  still  admits  that 
this  may  be  due,  to  some  extent,  to  the  fact  that  the  publication  of  these  editions  took 
place  at  the  time  of  a  certain  decline  of  the  publishing  industry  in  Russia  related  to 
the  general  state  of  the  economy  at  that  time  (in  the  1972,  1976  and  1986  editions 
there  were  much  fewer  such  defects). 
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Vll 


Before  starting  to  work  on  the  new  edition,  I  asked  my  colleagues  from  our  lab¬ 
oratory  at  the  Sobolev  Institute  of  Mathematics  and  from  the  Chair  of  Probability 
Theory  and  Mathematical  Statistics  at  Novosibirsk  State  University  to  prepare  lists 
of  any  typos  and  other  inaccuracies  they  had  spotted  in  the  book,  as  well  as  sug¬ 
gested  improvements  of  exposition.  I  am  very  grateful  to  everyone  who  provided 
me  with  such  information.  I  would  like  to  express  special  thanks  to  I.S.  Borisov, 
V.I.  Lotov,  A. A.  Mogul’sky  and  S.G.  Foss,  who  also  offered  a  number  of  method¬ 
ological  improvements. 

I  am  also  deeply  grateful  to  T.V.  Belyaeva  for  her  invaluable  assistance  in  type¬ 
setting  the  book  with  its  numerous  changes.  Without  that  help,  the  work  on  the  new 
edition  would  have  been  much  more  difficult. 

A. A.  Borovkov 


Foreword  to  the  Third  and  Fourth  Editions 


This  book  has  been  written  on  the  basis  of  the  Russian  version  (1986)  published 
by  “Nauka”  Publishers  in  Moscow.  A  number  of  sections  have  been  substantially 
revised  and  several  new  chapters  have  been  introduced.  The  author  has  striven  to 
provide  a  complete  and  logical  exposition  and  simpler  and  more  illustrative  proofs. 
The  1986  text  was  preceded  by  two  earlier  editions  (1972  and  1976).  The  first  one 
appeared  as  an  extended  version  of  lecture  notes  of  the  course  the  author  taught 
at  the  Department  of  Mechanics  and  Mathematics  of  Novosibirsk  State  University. 
Each  new  edition  responded  to  comments  by  the  readers  and  was  completed  with 
new  sections  which  made  the  exposition  more  unified  and  complete. 

The  readers  are  assumed  to  be  familiar  with  a  traditional  calculus  course.  They 
would  also  benefit  from  knowing  elements  of  measure  theory  and,  in  particular, 
the  notion  of  integral  with  respect  to  a  measure  on  an  arbitrary  space  and  its  basic 
properties.  However,  provided  they  are  prepared  to  use  a  less  general  version  of 
some  of  the  assertions,  this  lack  of  additional  knowledge  will  not  hinder  the  reader 
from  successfully  mastering  the  material.  It  is  also  possible  for  the  reader  to  avoid 
such  complications  completely  by  reading  the  respective  Appendices  (located  at  the 
end  of  the  book)  which  contain  all  the  necessary  results. 

The  first  ten  chapters  of  the  book  are  devoted  to  the  basics  of  probability  theory 
(including  the  main  limit  theorems  for  cumulative  sums  of  random  variables),  and  it 
is  best  to  read  them  in  succession.  The  remaining  chapters  deal  with  more  specific 
parts  of  the  theory  of  probability  and  could  be  divided  into  two  blocks:  random 
processes  in  discrete  time  (or  random  sequences,  Chaps.  12  and  14-16)  and  random 
processes  in  continuous  time  (Chaps.  17-21). 

There  are  also  chapters  which  remain  outside  the  mainstream  of  the  text  as  indi¬ 
cated  above.  These  include  Chap.  1 1  “Factorisation  Identities”.  The  chapter  not  only 
contains  a  series  of  very  useful  probabilistic  results,  but  also  displays  interesting  re¬ 
lationships  between  problems  on  random  walks  in  the  presence  of  boundaries  and 
boundary  problems  of  complex  analysis.  Chapter  13  “Information  and  Entropy”  and 
Chap.  19  “Functional  Limit  Theorems”  also  deviate  from  the  mainstream.  The  for¬ 
mer  deals  with  problems  closely  related  to  probability  theory  but  very  rarely  treated 
in  texts  on  the  discipline.  The  latter  presents  limit  theorems  for  the  convergence 
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of  processes  generated  by  cumulative  sums  of  random  variables  to  the  Wiener  and 
Poisson  processes;  as  a  consequence,  the  law  of  the  iterated  logarithm  is  established 
in  that  chapter. 

The  book  has  incorporated  a  number  of  methodological  improvements.  Some 
parts  of  it  are  devoted  to  subjects  to  be  covered  in  a  textbook  for  the  first  time  (for 
example,  Chap.  16  on  stochastic  recursive  sequences  playing  an  important  role  in 
applications). 

The  book  can  serve  as  a  basis  for  third  year  courses  for  students  with  a  rea¬ 
sonable  mathematical  background,  and  also  for  postgraduates.  A  one-semester  (or 
two-trimester)  course  on  probability  theory  might  consist  (there  could  be  many  vari¬ 
ants)  of  the  following  parts:  Chaps.  1-2,  Sects.  3. 1-3.4,  4. 1-4.6  (partially),  5.2  and 
5.4  (partially),  6. 1-6.3  (partially),  7.1,  7.2,  7.4-7. 6,  8. 1-8.2  and  8.4  (partially),  10.1, 
10.3,  and  the  main  results  of  Chap.  12. 

For  a  more  detailed  exposition  of  some  aspects  of  Probability  Theory  and  the 
Theory  of  Random  Processes,  see  for  example  [2,  10,  12-14,  26,  31]. 

While  working  on  the  different  versions  of  the  book,  I  received  advice  and 
help  from  many  of  my  colleagues  and  friends.  I  am  grateful  to  Yu.V.  Prokhorov, 
V.V.  Petrov  and  B.A.  Rogozin  for  their  numerous  useful  comments  which  helped 
to  improve  the  first  variant  of  the  book.  I  am  deeply  indebted  to  A.N.  Kolmogorov 
whose  remarks  and  valuable  recommendations,  especially  of  methodological  char¬ 
acter,  contributed  to  improvements  in  the  second  version  of  the  book.  In  regard  to 
the  second  and  third  versions,  I  am  again  thankful  to  V.V  Petrov  who  gave  me  his 
comments,  and  to  P.  Franken,  with  whom  I  had  a  lot  of  useful  discussions  while  the 
book  was  translated  into  German. 

In  conclusion  I  want  to  express  my  sincere  gratitude  to  V.V.  Yurinskii,  A. I.  Sakha- 
nenko,  K.A.  Borovkov,  and  other  colleagues  of  mine  who  also  gave  me  their  com¬ 
ments  on  the  manuscript.  I  would  also  like  to  express  my  gratitude  to  all  those  who 
contributed,  in  one  way  or  another,  to  the  preparation  and  improvement  of  the  book. 


A. A.  Borovkov 


For  the  Reader’s  Attention 


The  numeration  of  formulas,  lemmas,  theorems  and  corollaries  consists  of  three 
numbers,  of  which  the  first  two  are  the  numbers  of  the  current  chapter  and  section. 
For  instance,  Theorem  4.3.1  means  Theorem  1  from  Sect.  3  of  Chap.  4.  Section  6.2 
means  Sect.  2  of  Chap.  6. 

The  sections  marked  with  an  asterisk  may  be  omitted  in  the  first  reading. 

The  symbol  □  at  the  end  of  a  paragraph  denotes  the  end  of  a  proof  or  an  important 
argument,  when  it  should  be  pointed  out  that  the  argument  has  ended. 

The  symbol  :=,  systematically  used  in  the  book,  means  that  the  left-hand  side  is 
defined  to  be  given  by  the  right-hand  side.  The  relation  =:  has  the  opposite  meaning: 
the  right-hand  side  is  defined  by  the  left-hand  side. 

The  reader  may  find  it  useful  to  refer  to  the  Index  of  Basic  Notation  and  Subject 
index,  which  can  be  found  at  the  end  of  this  book. 
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Introduction 


1.  It  is  customary  to  set  the  origins  of  Probability  Theory  at  the  17th  century  and 
relate  them  to  combinatorial  problems  of  games  of  chance.  The  latter  can  hardly  be 
considered  a  serious  occupation.  However,  it  is  games  of  chance  that  led  to  prob¬ 
lems  which  could  not  be  stated  and  solved  within  the  framework  of  the  then  existing 
mathematical  models,  and  thereby  stimulated  the  introduction  of  new  concepts,  ap¬ 
proaches  and  ideas.  These  new  elements  can  already  be  encountered  in  writings  by 
P.  Fermat,  D.  Pascal,  C.  Huygens  and,  in  a  more  developed  form  and  somewhat 
later,  in  the  works  of  J.  Bernoulli,  P.-S.  Laplace,  C.F.  Gauss  and  others.  The  above- 
mentioned  names  undoubtedly  decorate  the  genealogy  of  Probability  Theory  which, 
as  we  saw,  is  also  related  to  some  extent  to  the  vices  of  society.  Incidentally,  as  it 
soon  became  clear,  it  is  precisely  this  last  circumstance  that  can  make  Probability 
Theory  more  attractive  to  the  reader. 

The  first  text  on  Probability  Theory  was  Huygens’  treatise  De  Ratiociniis  in  Ludo 
Alea  (“On  Ratiocination  in  Dice  Games”,  1657).  A  bit  later  in  1663  the  book  Liber 
de  Ludo  Aleae  (“Book  on  Games  of  Chance”)  by  G.  Cardano  was  published  (in 
fact  it  was  written  earlier,  in  the  mid  16th  century).  The  subject  of  these  treatises 
was  the  same  as  in  the  writings  of  Fermat  and  Pascal:  dice  and  card  games  (prob¬ 
lems  within  the  framework  of  Sect.  1.2  of  the  present  book).  As  if  Huygens  foresaw 
future  events,  he  wrote  that  if  the  reader  studied  the  subject  closely,  he  would  no¬ 
tice  that  one  was  not  dealing  just  with  a  game  here,  but  rather  that  the  foundations 
of  a  very  interesting  and  deep  theory  were  being  laid.  Huygens’  treatise,  which  is 
also  known  as  the  first  text  introducing  the  concept  of  mathematical  expectation, 
was  later  included  by  J.  Bernoulli  in  his  famous  book  Ars  Conjectandi  (“The  Art 
of  Conjecturing”;  published  posthumously  in  1713).  To  this  book  is  related  the  no¬ 
tion  of  the  so-called  Bernoulli  scheme  (see  Sect.  1.3),  for  which  Bernoulli  gave  a 
cumbersome  (cf.  our  Sect.  5.1)  but  mathematically  faultless  proof  of  the  first  limit 
theorem  of  Probability  Theory,  the  Law  of  Large  Numbers. 

By  the  end  of  the  19th  and  the  beginning  of  the  20th  centuries,  the  natural  sci¬ 
ences  led  to  the  formulation  of  more  serious  problems  which  resulted  in  the  develop¬ 
ment  of  a  large  branch  of  mathematics  that  is  nowadays  called  Probability  Theory. 
This  subject  is  still  going  through  a  stage  of  intensive  development.  To  a  large  extent, 
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Probability  Theory  owes  its  elegance,  modern  form  and  a  multitude  of  achievements 
to  the  remarkable  Russian  mathematicians  P.L.  Chebyshev,  A. A.  Markov,  A.N.  Kol¬ 
mogorov  and  others. 

The  fact  that  increasing  our  knowledge  about  nature  leads  to  further  demand  for 
Probability  Theory  appears,  at  first  glance,  paradoxical.  Indeed,  as  the  reader  might 
already  know,  the  main  object  of  the  theory  is  randomness,  or  uncertainty,  which  is 
due,  as  a  rule,  to  a  lack  of  knowledge.  This  is  certainly  so  in  the  classical  example 
of  coin  tossing,  where  one  cannot  take  into  account  all  the  factors  influencing  the 
eventual  position  of  the  tossed  coin  when  it  lands. 

However,  this  is  only  an  apparent  paradox.  In  fact,  there  are  almost  no  exact  de¬ 
terministic  quantitative  laws  in  nature.  Thus,  for  example,  the  classical  law  relating 
the  pressure  and  temperature  in  a  volume  of  gas  is  actually  a  result  of  a  probabilistic 
nature  that  relates  the  number  of  collisions  of  particles  with  the  vessel  walls  to  their 
velocities.  The  fact  is,  at  typical  temperatures  and  pressures,  the  number  of  particles 
is  so  large  and  their  individual  contributions  are  so  small  that,  using  conventional 
instruments,  one  simply  cannot  register  the  random  deviations  from  the  relationship 
which  actually  take  place.  This  is  not  the  case  when  one  studies  more  sparse  flows 
of  particles — say,  cosmic  rays — although  there  is  no  qualitative  difference  between 
these  two  examples. 

We  could  move  in  a  somewhat  different  direction  and  name  here  the  uncertainty 
principle  stating  that  one  cannot  simultaneously  obtain  exact  measurements  of  any 
two  conjugate  observables  (for  example,  the  position  and  velocity  of  an  object). 
Here  randomness  is  not  entailed  by  a  lack  of  knowledge,  but  rather  appears  as  a  fun¬ 
damental  phenomenon  reflecting  the  nature  of  things.  For  instance,  the  lifetime  of  a 
radioactive  nucleus  is  essentially  random,  and  this  randomness  cannot  be  eliminated 
by  increasing  our  knowledge. 

Thus,  uncertainty  was  there  at  the  very  beginning  of  the  cognition  process,  and 
it  will  always  accompany  us  in  our  quest  for  knowledge.  These  are  rather  general 
comments,  of  course,  but  it  appears  that  the  answer  to  the  question  of  when  one 
should  use  the  methods  of  Probability  Theory  and  when  one  should  not  will  always 
be  determined  by  the  relationship  between  the  degree  of  precision  we  want  to  attain 
when  studying  a  given  phenomenon  and  what  we  know  about  the  nature  of  the  latter. 

2.  In  almost  all  areas  of  human  activity  there  are  situations  where  some  exper¬ 
iments  or  observations  can  be  repeated  a  large  number  of  times  under  the  same 
conditions.  Probability  Theory  deals  with  those  experiments  of  which  the  result  (ex¬ 
pressed  in  one  way  or  another)  may  vary  from  trial  to  trial.  The  events  that  refer  to 
the  experiment’s  result  and  which  may  or  may  not  occur  are  usually  called  random 
events. 

For  example,  suppose  we  are  tossing  a  coin.  The  experiment  has  only  two  out¬ 
comes:  either  heads  or  tails  show  up,  and  before  the  experiment  has  been  carried 
out,  it  is  impossible  to  say  which  one  will  occur.  As  we  have  already  noted,  the  rea¬ 
son  for  this  is  that  we  cannot  take  into  account  all  the  factors  influencing  the  final 
position  of  the  coin.  A  similar  situation  will  prevail  if  you  buy  a  ticket  for  each  lot¬ 
tery  draw  and  try  to  predict  whether  it  will  win  or  not,  or,  observing  the  operation  of 
a  complex  machine,  you  try  to  determine  in  advance  if  it  will  have  failed  before  or 
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Fig.  1  The  plot  of  the 
relative  frequencies  rih/n 
corresponding  to  the  outcome 
sequence  htthtthhhthht  in 
the  coin  tossing  experiment 


after  a  given  time.  In  such  situations,  it  is  very  hard  to  find  any  laws  when  consid¬ 
ering  the  results  of  individual  experiments.  Therefore  there  is  little  justification  for 
constructing  any  theory  here. 

However,  if  one  turns  to  a  long  sequence  of  repetitions  of  such  an  experiment, 
an  interesting  phenomenon  becomes  apparent.  While  individual  results  of  the  ex¬ 
periments  display  a  highly  “irregular”  behaviour,  the  average  results  demonstrate 
stability.  Consider,  say,  a  long  series  of  repetitions  of  our  coin  tossing  experiment 
and  denote  by  njt  the  number  of  heads  in  the  first  n  trials.  Plot  the  ratio  n^/n  ver¬ 
sus  the  number  n  of  conducted  experiments  (see  Fig.  1 ;  the  plot  corresponds  to  the 
outcome  sequence  htthtthhhthh,  where  h  stands  for  heads  and  t  for  tails,  respec¬ 
tively). 

We  will  then  see  that,  as  n  increases,  the  polygon  connecting  the  consecutive 
points  (n,nh/n)  very  quickly  approaches  the  straight  line  rih/n  =  1/2.  To  verify 
this  observation,  G.L.  Leclerc,  comte  de  Buffon,1  tossed  a  coin  4040  times.  The 
number  of  heads  was  2048,  so  that  the  relative  frequency  rih/n  of  heads  was  0.5069. 
K.  Pearson  tossed  a  coin  24,000  times  and  got  12,012  heads,  so  that  nn In  =  0.5005. 

It  turns  out  that  this  phenomenon  is  universal:  the  relative  frequency  of  a  certain 
outcome  in  a  series  of  repetitions  of  an  experiment  under  the  same  conditions  tends 
towards  a  certain  number  p  e  [0,  1]  as  the  number  of  repetitions  grows.  It  is  an 
objective  law  of  nature  which  forms  the  foundation  of  Probability  Theory. 

It  would  be  natural  to  define  the  probability  of  an  experiment  outcome  to  be  just 
the  number  p  towards  which  the  relative  frequency  of  the  outcome  tends.  How¬ 
ever,  such  a  definition  of  probability  (usually  related  to  the  name  of  R.  von  Mises) 
has  proven  to  be  inconvenient.  First  of  all,  in  reality,  each  time  we  will  be  dealing 
not  with  an  infinite  sequence  of  frequencies,  but  rather  with  finitely  many  elements 
thereof.  Obtaining  the  entire  sequence  is  unfeasible.  Hence  the  frequency  (let  it 
again  be  nn/n )  of  the  occurrence  of  a  certain  outcome  will,  as  a  rule,  be  different 
for  each  new  series  of  repetitions  of  the  same  experiment. 

This  fact  led  to  intense  discussions  and  a  lot  of  disagreement  regarding  how  one 
should  define  the  concept  of  probability.  Fortunately,  there  was  a  class  of  phenomena 
that  possessed  certain  “symmetry”  (in  gambling,  coin  tossing  etc.)  for  which  one 
could  compute  in  advance ,  prior  to  the  experiment,  the  expected  numerical  values 


!The  data  is  borrowed  from  [15]. 
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of  the  probabilities.  Take,  for  instance,  a  cube  made  of  a  sufficiently  homogeneous 
material.  There  are  no  reasons  for  the  cube  to  fall  on  any  of  its  faces  more  often 
than  on  some  other  face.  It  is  therefore  natural  to  expect  that,  when  rolling  a  die  a 
large  number  of  times,  the  frequency  of  each  of  its  faces  will  be  close  to  1/6.  Based 
on  these  considerations,  Laplace  believed  that  the  concept  of  equiprobability  is  the 
fundamental  one  for  Probability  Theory.  The  probability  of  an  event  would  then  be 
defined  as  the  ratio  of  the  number  of  “favourable”  outcomes  to  the  total  number  of 
possible  outcomes.  Thus,  the  probability  of  getting  an  odd  number  of  points  (e.g.  1, 
3  or  5)  when  rolling  a  die  once  was  declared  to  be  3/6  (i.e.  the  number  of  faces  with 
an  odd  number  of  points  was  divided  by  the  total  number  of  all  faces).  If  the  die  were 
rolled  ten  times,  then  one  would  have  6 10  in  the  denominator,  as  this  number  gives 
the  total  number  of  equally  likely  outcomes  and  calculating  probabilities  reduces  to 
counting  the  number  of  “favourable  outcomes”  (the  ones  resulting  in  the  occurrence 
of  a  given  event). 

The  development  of  the  mathematical  theory  of  probabilities  began  from  the  in¬ 
stance  when  one  started  defining  probability  as  the  ratio  of  the  number  of  favourable 
outcomes  to  the  total  number  of  equally  likely  outcomes,  and  this  approach  is  nowa¬ 
days  called  “classical”  (for  more  details,  see  Chap.  1). 

Later  on,  at  the  beginning  of  the  20th  century,  this  approach  was  severely  crit¬ 
icised  for  being  too  restrictive.  The  initiator  of  the  critique  was  R.  von  Mises.  As 
we  have  already  noted,  his  conception  was  based  on  postulating  stability  of  the  fre¬ 
quencies  of  events  in  a  long  series  of  experiments.  That  was  a  confusion  of  physical 
and  mathematical  concepts.  No  passage  to  the  limit  can  serve  as  justification  for 
introducing  the  notion  of  “probability”.  If,  for  instance,  the  values  nn/n  were  to 
converge  to  the  limiting  value  1  /  2  in  Fig.  1  too  slowly,  that  would  mean  that  no¬ 
body  would  be  able  to  find  the  value  of  that  limit  in  the  general  (non-classical)  case. 
So  the  approach  is  clearly  vulnerable:  it  would  mean  that  Probability  Theory  would 
be  applicable  only  to  those  situations  wher z  frequencies  have  a  limit.  But  why  fre¬ 
quencies  would  have  a  limit  remained  unexplained  and  was  not  even  discussed. 

In  this  relation,  R.  von  Mises’  conception  has  been  in  turn  criticised  by  many 
mathematicians,  including  A.Ya.  Khinchin,  S.N.  Bernstein,  A.N.  Kolmogorov  and 
others.  Somewhat  later,  another  approach  was  suggested  that  proved  to  be  fruitful 
for  the  development  of  the  mathematical  theory  of  probabilities.  Its  general  features 
were  outlined  by  S.N.  Bernstein  in  1908.  In  1933  a  rather  short  book  “Foundations 
of  Probability  Theory”  by  A.N.  Kolmogorov  appeared  that  contained  a  complete 
and  clear  exposition  of  the  axioms  of  Probability  Theory.  The  general  construction 
of  the  concept  of  probability  based  on  Kolmogorov’s  axiomatics  removed  all  the 
obstacles  for  the  development  of  the  theory  and  is  nowadays  universally  accepted. 

The  creation  of  an  axiomatic  Probability  Theory  provided  a  solution  to  the  sixth 
Hilbert  problem  (which  concerned,  in  particular,  Probability  Theory)  that  had  been 
formulated  by  D.  Hilbert  at  the  Second  International  Congress  of  Mathematicians 
in  Paris  in  1900.  The  problem  was  on  the  axiomatic  construction  of  a  number  of 
physical  sciences,  Probability  Theory  being  classified  as  such  by  Hilbert  at  that 
time. 

An  axiomatic  foundation  separates  the  mathematical  aspect  from  the  physical: 
one  no  longer  needs  to  explain  how  and  where  the  concept  of  probability  comes 
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from.  The  concept  simply  becomes  a  primitive  one,  its  properties  being  described 
by  axioms  (which  are  essentially  the  axioms  of  Measure  Theory).  However,  the 
problem  of  how  the  probability  thus  introduced  is  related  (and  can  be  applied)  to 
the  real  world  remains  open.  But  this  problem  is  mostly  removed  by  the  remarkable 
fact  that,  under  the  axiomatic  construction,  the  desired  fundamental  property  that  the 
frequencies  of  the  occurrence  of  an  event  converge  to  the  probability  of  the  event 
does  take  place  and  is  a  precise  mathematical  result.  (For  more  details,  see  Chaps.  2 
and  5.)2 

We  will  begin  by  defining  probability  in  a  somewhat  simplified  situation,  in  the 
so-called  discrete  case. 


2Much  later,  in  the  1960s  A.N.  Kolmogorov  attempted  to  develop  a  fundamentally  different  ap¬ 
proach  to  the  notions  of  probability  and  randomness.  In  that  approach,  the  measure  of  randomness, 
say,  of  a  sequence  0,  1,  0,  0,  1, . . .  consisting  of  Os  and  Is  (or  some  other  symbols)  is  the  complex¬ 
ity  of  the  algorithm  describing  this  sequence.  The  new  approach  stimulated  the  development  of  a 
number  of  directions  in  contemporary  mathematics,  but,  mostly  due  to  its  complexity,  has  not  yet 
become  widely  accepted. 
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Chapter  1 

Discrete  Spaces  of  Elementary  Events 


Abstract  Section  1.1  introduces  the  fundamental  concept  of  probability  space, 
along  with  some  basic  terminology  and  properties  of  probability  when  it  is  easy 
to  do,  i.e.  in  the  simple  case  of  random  experiments  with  finitely  or  at  most  count¬ 
ably  many  outcomes.  The  classical  scheme  of  finitely  many  equally  likely  outcomes 
is  discussed  in  more  detail  in  Sect.  1.2.  Then  the  Bernoulli  scheme  is  introduced  and 
the  properties  of  the  binomial  distribution  are  studied  in  Sect.  1.3.  Sampling  without 
replacement  from  a  large  population  is  considered,  and  convergence  of  the  emerging 
hypergeometric  distributions  to  the  binomial  one  is  formally  proved.  The  inclusion- 
exclusion  formula  for  the  probabilities  of  unions  of  events  is  derived  and  illustrated 
by  some  applications  in  Sect.  1.4. 


1.1  Probability  Space 

To  mathematically  describe  experiments  with  random  outcomes,  we  will  first  of  all 
need  the  notion  of  the  space  of  elementary  events  (or  outcomes )  corresponding  to  the 
experiment  under  consideration.  We  will  denote  by  Q  any  set  such  that  each  result 
of  the  experiment  we  are  interested  in  can  be  uniquely  specified  by  the  elements 
of  Q . 

In  the  simplest  experiments  we  usually  deal  with  finite  spaces  of  elementary  out¬ 
comes.  In  the  coin  tossing  example  we  considered  above,  Q  consists  of  two  ele¬ 
ments,  “heads”  and  “tails”.  In  the  die  rolling  experiment,  the  space  Q  is  also  finite 
and  consists  of  6  elements.  However,  even  for  tossing  a  coin  (or  rolling  a  die)  one 
can  arrange  such  experiments  for  which  finite  spaces  of  elementary  events  will  not 
suffice.  For  instance,  consider  the  following  experiment:  a  coin  is  tossed  until  heads 
shows  for  the  first  time,  and  then  the  experiment  is  stopped.  If  t  designates  tails  in 
a  toss  and  h  heads,  then  an  “elementary  outcome”  of  the  experiment  can  be  repre¬ 
sented  by  a  sequence  (tt . . .  th).  There  are  infinitely  many  such  sequences,  and  all 
of  them  are  different,  so  there  is  no  way  to  describe  unambiguously  all  the  outcomes 
of  the  experiment  by  elements  of  a  finite  space. 

Consider  finite  or  countably  infinite  spaces  of  elementary  events  ^2.  These  are 
the  so-called  discrete  spaces.  We  will  denote  the  elements  of  a  space  Q  by  the  letter 
co  and  call  them  elementary  events  (or  elementary  outcomes). 
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1  Discrete  Spaces  of  Elementary  Events 


The  notion  of  the  space  of  elementary  events  itself  is  mathematically  undefinable: 
it  is  a  primitive  one,  like  the  notion  of  a  point  in  geometry.  The  specific  nature  of  Q 
will,  as  a  rule,  be  of  no  interest  to  us. 

Any  subset  A  c  Q  will  be  called  an  event  (the  event  A  occurs  if  any  of  the 
elementary  outcomes  co  e  A  occurs). 

The  union  or  sum  of  two  events  A  and  B  is  the  event  A  U  B  (which  may  also  be 
denoted  by  A  +  B)  consisting  of  the  elementary  outcomes  which  belong  to  at  least 
one  of  the  events  A  and  B.  The  product  or  intersection  AB  (which  is  often  denoted 
by  A  fl  B  as  well)  is  the  event  consisting  of  all  elementary  events  belonging  to  both 
A  and  B .  The  difference  of  the  events  A  and  B  is  the  set  A  —  B  (also  often  denoted 
by  A  \  B)  consisting  of  all  elements  of  A  not  belonging  to  B.  The  set  Q  is  called  the 
certain  event.  The  empty  set  0  is  called  the  impossible  event.  The  set  A  =  Q  —  A 
is  called  the  complementary  event  of  A.  Two  events  A  and  B  are  mutually  exclusive 
if  AB  =  0. 

Let,  for  instance,  our  experiment  consist  in  rolling  a  die  twice.  Here  one  can  take 
the  space  of  elementary  events  to  be  the  set  consisting  of  36  elements  (/,  j),  where  i 
and  j  run  from  1  to  6  and  denote  the  numbers  of  points  that  show  up  in  the  first  and 
second  roll  respectively.  The  events  A  =  {i  +  j  <  3}  and  B  =  {j  =  6}  are  mutually 
exclusive.  The  product  of  the  events  A  and  C  =  {j  is  even}  is  the  event  (1,2).  Note 
that  if  we  were  interested  in  the  events  related  to  the  first  roll  only,  we  could  consider 
a  smaller  space  of  elementary  events  consisting  of  just  6  elements  i  =  1,  2, . . . ,  6. 

One  says  that  the  probabilities  of  elementary  events  are  given  if  a  nonnegative 
real-valued  function  P  is  given  on  Q  such  that  P(<A)  =  1  (one  al so  says  that 

the  function  P  specifies  a  probability  distribution  on  f2). 

The  probability  of  an  event  A  is  the  number 

P(A)  :=£>(«). 

(jOClj 4 

This  definition  is  consistent,  for  the  series  on  the  right  hand  side  is  absolutely  con¬ 
vergent. 

We  note  here  that  specific  numerical  values  of  the  function  P  will  also  be  of  no 
interest  to  us:  this  is  just  an  issue  of  the  practical  value  of  the  model.  For  instance, 
it  is  clear  that,  in  the  case  of  a  symmetric  die,  for  the  outcomes  1,  2, . . . ,  6  one 
should  put  P(  1)  =  P(2)  =  •  •  •  =  P(6)  =  1  /6;  for  a  symmetric  coin,  one  has  to  choose 
the  values  P (h)  =  P(f)  =  1/2  and  not  any  others.  In  the  experiment  of  tossing  a 
coin  until  heads  shows  for  the  first  time,  one  should  put  P (h)  =  1  /2,  P (th)  =  1  /22, 

P (tth)  =  1/23, _ Since  Yl^Li  11  =  1,  the  function  P  given  in  this  way  on  the 

outcomes  of  the  form  (t . . .  th)  will  define  a  probability  distribution  on  Q .  For  ex¬ 
ample,  to  calculate  the  probability  that  the  experiment  stops  on  an  even  step  (that  is, 
the  probability  of  the  event  composed  of  the  outcomes  (th),  (t tth), .. .),  one  should 
consider  the  sum  of  the  corresponding  probabilities  which  is  equal  to 
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In  the  experiments  mentioned  in  the  Introduction,  where  one  had  to  guess  when 
a  device  will  break  down — before  a  given  time  (the  event  A)  or  after  it,  quantita¬ 
tive  estimates  of  the  probability  P(A)  can  usually  only  be  based  on  the  results  of  the 
experiments  themselves.  The  methods  of  estimating  unknown  probabilities  from  ob¬ 
servation  results  are  studied  in  Mathematical  Statistics,  the  subject-matter  of  which 
will  be  exemplified  somewhat  later  by  a  problem  from  this  chapter. 

Note  further  that  by  no  means  can  one  construct  models  with  discrete  spaces  of 
elementary  events  for  all  experiments.  For  example,  suppose  that  one  is  measuring 
the  energy  of  particles  whose  possible  values  fill  the  interval  [0,  V],  V  >0,  but  the 
set  of  points  of  this  interval  (that  is,  the  set  of  elementary  events)  is  continuous. 
Or  suppose  that  the  result  of  an  experiment  is  a  patient’s  electrocardiogram.  In  this 
case,  the  result  of  the  experiment  is  an  element  of  some  functional  space.  In  such 
cases,  more  general  schemes  are  needed. 

From  the  above  definitions,  making  use  of  the  absolute  convergence  of  the  series 
^coeA  P(^)’  one  can  easily  derive  the  following  properties  of  probability: 

(1)  P(0)  =  0,  P(£?)  =  1. 

(2)  P(A  +  B)  =  Zmsaub  Pico)  =  ZcozA  Pico)  +  P(co)  -  £  coeAHB  p  (C0)  = 
P(A)  +P(5)  -P(Afl). 

(3)  P(A)  =  1  —  P(A). 

This  entails,  in  particular,  that,  for  disjoint  (mutually  exclusive)  events  A  and  B , 

P(A  +  5)=P(A)+P(5). 

This  property  of  the  additivity  of  probability  continues  to  hold  for  an  arbitrary 
number  of  disjoint  events  A\ ,  A2, . . . :  if  A/  A  j  =  0  for  i  ^  j ,  then 

(00  \  00 

UA*)=EP(At)-  (i-i-i) 

k= 1  /  k=i 

This  follows  from  the  equality 

(n  \  n 

U A*  =£P(A‘) 

k=i  /  k=i 

and  the  fact  that  P(UibU+i  Ak)  0  as  n  — >  00.  To  prove  the  last  relation,  first 
enumerate  the  elementary  events.  Then  we  will  be  dealing  with  the  sequence 
coi,  (02,  -  -  - ;  U  (Ok  =  &,  p(U^>n  Vk)  =  J2k>n  p(^)  0  as  n  ->  00.  Denote  by 

nk  the  number  of  events  Aj  such  that  cok  G  Aj  =  Ank ;  rik  =  0  if  cokAj  =  0  for 
all  j .  If  nk  <  N  <  00  for  all  k,  then  the  events  Aj  with  j  >  N  are  empty  and 
the  desired  relation  is  obvious.  If  Ns  :=  ma  Xk<s  n^^ooas^^oo,  then  one  has 
IJ j>n  Aj  c  U k>s  for  n  >  Ns,  and  therefore 

4u^Hu  COkj  =  E  P {(ok)  — ^  0  as  ^  — >  00. 

^ j>n  /  ^ k>s  /  k>s 
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1  Discrete  Spaces  of  Elementary  Events 


The  required  relation  is  proved. 

For  arbitrary  A  and  B ,  one  has  P(A  +  B)  <  P(A)  +  P(Z?).  A  similar  inequality 
also  holds  for  the  sum  of  an  arbitrary  number  of  events: 

(oo  \  oo 

k=l  /  k=l 


This  follows  from  (1.1.1)  and  the  representation  of  |J  Ak  as  the  union  (J  AkBk  of 
disjoint  events  AkBk ,  where  Bk  =  |J j<kAj.  It  remains  to  note  that  P (AkBk)  < 

P  (A^). 

Now  we  will  consider  several  important  special  cases. 


1.2  The  Classical  Scheme 

Let  Q  consist  of  n  elements  and  all  the  outcomes  be  equally  likely,  that  is  P(<z>)  = 
l/n  for  any  co  e  £2 .  In  this  case,  the  probability  of  any  event  A  is  defined  by  the 
formula 

1 

P(A)  :=  -{number  of  elements  of  A}. 
n 

This  is  the  so-called  classical  definition  of  probability  (the  term  uniform  discrete 
distribution  is  also  used). 

Let  a  set  {a\,  a2, . . . ,  an}  be  given,  which  we  will  call  the  general  popula¬ 
tion.  A  sample  of  size  k  from  the  general  population  is  an  ordered  sequence 
(ap ,  a j2, ,  ap).  One  can  form  this  sequence  as  follows:  the  first  element  ajx  is 
chosen  from  the  whole  population.  The  next  element  aj2  we  choose  from  the  general 
population  without  the  element  ap ;  the  element  ap  is  chosen  from  the  general  pop¬ 
ulation  without  the  elements  ap  and  ap ,  and  so  on.  Samples  obtained  in  such  a  way 
are  called  samples  without  replacement.  Clearly,  one  must  have  k  <  n  in  this  case. 
The  number  of  such  samples  of  size  k  coincides  with  the  number  of  arrangements 
of  k  elements  from  n : 


(n)k  :=  n(n  —  1  )(n  —  2)  •  •  •  (n  —  k  +  1). 

Indeed,  according  to  the  sampling  process,  in  the  first  position  we  can  have  any 
element  of  the  general  population,  in  the  second  position  any  of  the  remaining 
(n  —  1)  elements,  and  so  on.  We  could  prove  this  more  formally  by  induction  on  k. 

Assign  to  each  of  the  samples  without  replacement  the  probability  1  /(n)k-  Such 
a  sample  will  be  called  random.  This  is  clearly  the  classical  scheme. 

Calculate  the  probability  that  ap  =  a\  and  aj2  =  «2-  Since  the  remaining  k  —  2 
positions  can  be  occupied  by  any  of  the  remaining  n  —  2  elements  of  the  general 
population,  the  number  of  samples  without  replacement  having  elements  a\  and  <22 
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in  the  first  two  positions  equals  ( n  —  2)k-i-  Therefore  the  probability  of  that  event 
is  equal  to 

Qz  -  2)k-2  _  1 

(n)k  n(n-\Y 

One  can  think  of  a  sample  without  replacement  as  the  result  of  sequential  sampling 
from  a  collection  of  enumerated  balls  placed  in  an  urn.  Sampled  balls  are  not  re¬ 
turned  back  to  the  urn. 

However,  one  can  form  a  sample  in  another  way  as  well.  One  takes  a  ball  out  of 
the  urn  and  memorises  it.  Then  the  ball  is  returned  to  the  urn,  and  one  again  picks 
a  ball  from  the  urn;  this  ball  is  also  memorised  and  put  back  to  the  urn,  and  so  on. 
The  sample  obtained  in  this  way  is  called  a  sample  with  replacement .  At  each  step, 
one  can  pick  any  of  the  n  balls.  There  are  k  such  steps,  so  that  the  total  number  of 
such  samples  will  be  nk .  If  we  assign  the  probability  of  l/nk  to  each  sample,  this 
will  also  be  a  classical  scheme  situation. 

Calculate,  for  instance,  the  probability  that,  in  a  sample  with  replacement  of  size 
k  <  n,  all  the  elements  will  be  different.  The  number  of  samples  of  elements  without 
repetitions  is  the  same  as  the  number  of  samples  without  replacement,  i.e.  (n)k- 
Therefore  the  desired  probability  is  {n)k/nk . 

We  now  return  to  sampling  without  replacement  for  the  general  population 
{a\ ,  <22, . . . ,  an}.  We  will  be  interested  in  the  number  of  samples  of  size  k  <  n  which 
differ  from  each  other  in  their  composition  only.  The  number  of  samples  without 
replacement  of  size  k  which  have  the  same  composition  and  are  only  distinguished 
by  the  order  of  their  elements  is  k\  Hence  the  number  of  samples  of  different  com¬ 
position  equals 

(n)k  _(n\ 

k\  \k ) 

This  is  the  number  of  combinations  of  k  items  chosen  from  a  total  of  n  for  0  < 
k  <n.  If  the  initial  sample  is  random,  we  again  get  the  classical  probability  scheme, 
for  the  probability  of  each  new  sample  is 

k\  _  1 

Let  our  urn  contain  n  balls,  of  which  n\  are  black  and  n  —  n\  white.  We  sample  k 
balls  without  replacement.  What  is  the  probability  that  there  will  be  exactly  k\  black 
balls  in  the  sample?  The  total  number  of  samples  which  differ  in  the  composition 
is,  as  was  shown  above,  (^).  There  are  (^j)  ways  to  choose  k\  black  balls  from  the 
totality  of  n\  black  balls.  The  remaining  k  —  k\  white  balls  can  be  chosen  from  the 
totality  of  n  —  n\  white  balls  in  ways.  Note  that  clearly  any  collection  of 

black  balls  can  be  combined  with  any  collection  of  white  balls.  Therefore  the  total 


'in  what  follows,  we  put  (fj  =  0  for  k  <  0  and  k  >  n. 
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number  of  samples  of  size  k  which  differ  in  composition  and  contain  exactly  k\ 
black  balls  is  Thus  the  desired  probability  is  equal  to 


Pni,n  (^1  >  k) 


n  A  /n  —  n  i 
ki/  —  ki 


The  collection  of  numbers  Pnun(0,k),  Pnun(l,  k), . . . ,  Pnun(k,  k)  forms  the  so- 
called  hypergeometric  distribution.  From  the  derived  formula  it  follows,  in  particu¬ 
lar,  that,  for  any  0  <  n\  <n, 


Example  1.2.1  In  the  1980s,  a  version  of  a  lottery  called  “Sportloto  6  out  of  49” 
had  became  rather  popular  in  Russia.  A  gambler  chooses  six  from  the  totality  of 
49  sports  (designated  just  by  numbers).  The  prize  amount  is  determined  by  how 
many  sports  he  guesses  correctly  from  another  group  of  six  sports,  to  be  drawn  at 
random  by  a  mechanical  device  in  front  of  the  public.  What  is  the  probability  that 
the  gambler  correctly  guesses  all  six  sports?  A  similar  question  could  be  asked  about 
five  sports,  and  so  on. 

It  is  not  difficult  to  see  that  this  is  nothing  else  but  a  problem  on  the  hypergeo¬ 
metric  distribution  where  the  gambler  has  labelled  as  “white”  six  items  in  a  general 
population  consisting  of  49  items.  Therefore  the  probability  that,  of  the  six  items 
chosen  at  random,  k\  will  turn  out  to  be  “white”  (i.e.  will  coincide  with  those  la¬ 
belled  by  the  gambler)  is  equal  to  7*6,49  (^l,  k),  where  the  sample  size  k  equals  6. 
For  example,  the  probability  of  guessing  all  six  sports  correctly  is 

^6,49(6,6)  =  ^)  ^  7.2  x  10-8. 


In  connection  with  the  hypergeometric  distribution,  one  could  comment  on  the 
nature  of  problems  in  Probability  Theory  and  Mathematical  Statistics.  Knowing  the 
composition  of  the  general  population,  we  can  use  the  hypergeometric  distribution 
to  find  out  what  chances  different  compositions  of  the  sample  would  have.  This 
is  a  typical  direct  problem  of  probability  theory.  However,  in  the  natural  sciences 
one  usually  has  to  solve  inverse  problems:  how  to  determine  the  nature  of  general 
populations  from  the  composition  of  random  samples.  Generally  speaking,  such 
inverse  problems  form  the  subject  matter  of  Mathematical  Statistics. 


1.3  The  Bernoulli  Scheme 

Suppose  one  draws  a  sample  with  replacement  of  size  r  from  a  general  population 
consisting  of  two  elements  {0,  1}.  There  are  2r  such  samples.  Let  p  be  a  number  in 
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the  interval  [0,  1].  Define  a  nonnegative  function  P  on  the  set  Q  of  all  samples  in  the 
following  way:  if  a  sample  go  contains  exactly  k  ones,  then  P(&>)  =  pk(  1  —  p)r~k . 
To  verify  that  P  is  a  probability,  one  has  to  prove  the  equality 


P(tf)  =  1. 

It  is  easy  to  see  that  k  ones  can  be  arranged  in  r  places  in  Q  different  ways.  There¬ 
fore  there  is  the  same  number  of  samples  containing  exactly  k  ones.  Now  we  can 
compute  the  probability  of  Q : 

V 

P(tf )  =  L  V( l  -  pY~k  =  (p  +  0~  P))r  =  l. 

k=0 

The  second  equality  here  is  just  the  binomial  formula.  At  the  same  time  we  have 
found  that  the  probability  P(k,r )  that  the  sample  contains  exactly  k  ones  is: 


P(k,r ) 


This  is  the  so-called  binomial  distribution.  It  can  be  considered  as  the  distribution 
of  the  number  of  “successes”  in  a  series  of  r  trials  with  two  possible  outcomes  in 
each  trial:  1  (“success”)  and  0  (“failure”).  Such  a  series  of  trials  with  probability 
P(<z>)  defined  as  pk(  1  —  p)r~k,  where  k  is  the  number  of  successes  in  go,  is  called 
the  Bernoulli  scheme.  It  turns  out  that  the  trials  in  the  Bernoulli  scheme  have  the 
independence  property  which  will  be  discussed  in  the  next  chapter. 

It  is  not  difficult  to  verify  that  the  probability  of  having  1  at  a  fixed  place  in 
the  sample  (say,  at  position  s)  equals  p.  Indeed,  having  removed  the  item  number  s 
from  the  sample,  we  obtain  a  sample  from  the  same  population,  but  of  size  r  —  1 .  We 
will  find  the  desired  probability  if  we  multiply  the  probabilities  of  these  truncated 
samples  by  p  and  sum  over  all  “short”  samples.  Clearly,  we  will  get  p.  This  is  why 
the  number  p  in  the  Bernoulli  scheme  is  often  called  the  success  probability. 

Arguing  in  the  same  way,  we  find  that  the  probability  of  having  1  at  k  fixed 
positions  in  the  sample  equals  pk . 

Now  consider  how  the  probabilities  P(k,r )  of  various  outcomes  behave  as  k 
varies.  Let  us  look  at  the  ratio 


R(k,r) 


P(k,  r)  p  r  —  k  -\-  1 

P(k  —  1,  r)  1  —  p  k 


It  clearly  monotonically  decreases  as  k  increases,  the  value  of  the  ratio  being  less 
than  1  for  k/(r  +  1)  <  p  and  greater  than  1  for  k/(r  +  1)  >  p.  This  means  that 
the  probabilities  P(k,r )  first  increase  and  then,  for  k  >  p{r  +  1),  decrease  as  k 
increases. 

The  above  enables  one  to  estimate,  using  the  quantities  P(k,r ),  the  probabilities 


k 

Q(k,r)  =  P(j,  r ) 
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that  the  number  of  successes  in  the  Bernoulli  scheme  does  not  exceed  k.  Namely, 
for  k  <  p{r  +  1), 


Q(k,r)  =  P(k,r ) 
<  P(k,  r) 


(  1  1 

1  + - + - 

\  R(k,r )  R(k,  r)R(k  —  1,  r) 

P(k,r)  (r  +  l-k)p 

- =  P(k,  r) - 

R(k,r)  —  1  (r  +  l)p  —  k 


It  is  not  difficult  to  see  that  this  bound  will  be  rather  sharp  if  the  numbers  k  and  r 
are  large  and  the  ratio  k/(pr)  is  not  too  close  to  1.  In  that  case  the  sum 


1  1 

+  R(k,  r)  +  R(k,  r)R(k  —  1,  r) 

will  be  close  to  the  sum  of  the  geometric  series 


^ R  j(k,r) 


7=0 


R(k,  r)  —  1  ’ 


and  we  will  have  the  approximate  equality 


Q(k,r)  « 


P(k,r) 


(r  +  1  -  k)p 
(r  +  l)p  -  k 


(1.3.1) 


For  example,  for  r  =  30,  p  =  0.7  and  k  =  16  one  has  rp  =  21  and  F(^,  r)  ~ 
0.023.  Here  the  ratio  ^ equals  15  x  0.7 / 5.7  ~  1.84.  Hence  the  right  hand 
side  of  (1.3.1)  estimating  Q(k,r )  is  approximately  equal  to  0.023  x  1 .84  ~  0.042. 
The  true  value  of  Q(k ,  r)  for  the  given  values  of  r,  p  and  k  is  0.040  (correct  to  three 
decimals). 

Formula  (1.3.1)  will  be  used  in  the  example  in  Sect.  5.2. 

Now  consider  a  general  population  composed  of  n  items,  of  which  n\  are  of 
the  first  type  and  ri2  =  n  —  n\  of  the  second  type.  Draw  from  it  a  sample  without 
replacement  of  size  r . 


Theorem  1.3.1  Let  n  and  n\  tend  to  infinity  in  such  a  way  that  n\/n  — >►  p ,  where 
p  is  a  number  from  the  interval  [0,  1].  Then  the  following  relation  holds  true  for  the 
hypergeometric  distribution : 


Pn i,n(n,r)  ->  P(n,r). 


Proof  Divide  both  the  numerator  and  denominator  in  the  formula  for  Pni,n(r l,  r) 
(see  Sect.  1.2)  by  nr .  Putting  r2  =  r  —  r\  and  n2  ’.=  n  —  n\,  we  get 


Pn\,n  Or  5  7") 


^2! 


ri!(?7i  -  n)  r2!(^2  rf) 
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r\  rH('IL  _  _  ±)  .  .  .  ('ll  _ 

n  y  n  n  7  v  /?  /?  7  v  n 


n\  r\  —  \ 


n 


r\  !r2 ! 


«(1  _  I)...(l  -  E=i) 

n  v  n J  v  n  J 


X 


ri2  [ ri2  1 


^2  ^2  —  1 


n  \  n 


n 


n 


n 


Pn( l  -^)r2  = 


^2  _ 


71 


as  72  — >  oo.  The  theorem  is  proved. 


□ 


For  sufficiently  large  n ,  Pwi,n(n,  r)  is  close  to  P(r i,  r)  by  the  above  theorem. 
Therefore  the  Bernoulli  scheme  can  be  thought  of  as  sampling  without  replacement 
from  a  very  large  general  population  consisting  of  items  of  two  types,  the  proportion 
of  items  of  the  first  type  being  p. 

In  conclusion  we  will  consider  two  problems. 

Imagine  n  bins  in  which  we  place  at  random  r  enumerated  particles.  Each  particle 
can  be  placed  in  any  of  the  n  bins,  so  that  the  total  number  of  different  allocations  of 
r  particles  to  n  bins  will  be  nr .  Allocation  of  particles  to  bins  can  be  thought  of  as 
drawing  a  sample  with  replacement  of  size  r  from  a  general  population  of  n  items. 
We  will  assume  that  we  are  dealing  with  the  classical  scheme,  where  the  probability 
of  each  outcome  is  1  /nr . 

(1)  What  is  the  probability  that  there  are  exactly  r\  particles  in  the  &-th  bin? 
The  remaining  r  —  r\  particles  which  did  not  fall  into  bin  k  are  allocated  to  the 
remaining  n  —  1  bins.  There  are  ( n  —  l)r-ri  different  ways  in  which  these  r  —  r\ 
particles  can  be  placed  into  n  —  1  bins.  Of  the  totality  of  r  particles,  one  can  choose 
r  —  r\  particles  which  did  not  fall  into  bin  k  in  (r_^  )  different  ways.  Therefore  the 
desired  probability  is 

r  r  \tri/  1 

r  —  r\)  nr  \r—r\)n\  n 

This  probability  coincides  with  P(r i,  r)  in  the  Bernoulli  scheme  with  p  =  l/n. 

(2)  Now  let  us  compute  the  probability  that  at  least  one  bin  will  be  empty.  Denote 
this  event  by  A.  Let  A &  mean  that  the  &-th  bin  is  empty,  then 

n 

A  =  \jAk. 
k=  1 

To  find  the  probability  of  the  event  A,  we  will  need  a  formula  for  the  probability 
of  a  sum  (union)  of  events.  We  cannot  make  use  of  the  additivity  of  probability,  for 
the  events  Ak  are  not  disjoint  in  our  case. 
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Theorem  1.4.1  Let  A\,  A2, . . . ,  An  be  events.  Then 

(n  \  n 

UA')  =  Ep(A')-Ep(A'^) 

i=l  !  i  =  1  i  <  j 

i  <  j  <k 


Proof  One  has  to  make  use  of  induction  and  the  property  of  probability  that 

P(A  +  B)  =  P(A)  +  P(5)  -  P(Afl) 

which  we  proved  in  Sect.  1.1.  For  n  =  2  the  assertion  of  the  theorem  is  true.  Suppose 
it  is  true  for  any  n  —  1  events  A\ , . . . ,  An-\ .  Then,  setting  B  =  U/=/  M »  we  get 


n 


P  UAd=  P(S  +  A"}  =  P(B)  +  P(A»)  “  P(AnB) 


i  =  1 


Substituting  here  the  known  values 


P„  =  P(U,)  and  P(A„fi)  =  P^|J(A,An)^ 


we  obtain  the  assertion  of  the  theorem. 


□ 


Now  we  will  turn  to  the  second  problem  about  bins  (see  the  end  of  Sect.  1.3)  and 
find  the  probability  of  the  event  A  that  at  least  one  bin  is  empty.  We  represented  A 
in  the  form  U^=1  A&,  where  Ak  denotes  the  event  that  all  the  r  particles  miss  the 
k-th  bin.  One  has 

(1 n-\)r  (  1 V 

P(Ajk)  =  - - -  =  I  1  -  -  I  ,  k  <  n. 


n‘ 


n 


The  event  A^A/  means  that  all  r  particles  are  allocated  to  n  —  2  bins  with  labels 
differing  from  k  and  /,  and  therefore 

(n-2)r  (  2\r 

P(A&A/)  =  - - ~^  =  {  1 - |  ,  k,l<n. 


n1 


n 


Similarly, 


n- 3r  (  3\r 

P (AkAiAm)  = —  =  (  1 - )  ,  k,l,m<n , 


n' 


n 


and  so  on.  The  probability  of  the  event  A  is  equal  by  Theorem  1.4.1  to 

P-<A)  =  „l  '  -  i)r  -  (")(•  -  |V  + 
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Discussion  of  this  problem  will  be  continued  in  Example  4.1.5. 

As  an  example  of  the  use  of  Theorem  1 .4. 1  we  consider  one  more  problem  having 
many  varied  applications.  This  is  the  so-called  matching  problem. 

Suppose  n  items  are  arranged  in  a  certain  order.  They  are  rearranged  at  random 
(all  n\  permutations  are  equally  likely).  What  is  the  probability  that  at  least  one 
element  retains  its  position? 

There  are  n\  different  permutations.  Let  Ak  denote  the  event  that  the  k- th  item 
retains  its  position.  This  event  is  composed  of  (n  —  1)!  outcomes,  so  its  probability 
equals 


P(A*)  = 


(*-!)! 

n\ 


The  event  A^A/  means  that  the  k-th  and  /-th  items  retain  their  positions;  hence 


P(A*A/)  = 


(n-2)\ 


P(Ai...A*)  = 


(n  —  1))! 
n\ 


1! 

n\ 


Now  ULi  Ak  is  precisely  the  event  that  at  least  one  item  retains  its  position.  There¬ 
fore  we  can  make  use  of  Theorem  1.4.1  to  obtain 


/h\  (n  -  1)! 


/n\  (n  —  2)! 
W  n\ 


(n\{n-  3)!  ,  (-1)"-1 

\3  /  +  n\ 


1  1  (-1)"-1 

-  T  —  —  •  •  •  T  - 

2!  3!  n\ 


+ 


The  last  expression  in  the  parentheses  is  the  first  n  +  1  terms  of  the  expansion  of 
e~{  into  a  series.  Therefore,  as  n  — >  oo, 


Chapter  2 

An  Arbitrary  Space  of  Elementary  Events 


Abstract  The  chapter  begins  with  the  axiomatic  construction  of  the  probability 
space  in  the  general  case  where  the  number  of  outcomes  of  an  experiment  is  not 
necessarily  countable.  The  concepts  of  algebra  and  sigma-algebra  of  sets  are  intro¬ 
duced  and  discussed  in  detail.  Then  the  axioms  of  probability  and,  more  generally, 
measure  are  presented  and  illustrated  by  several  fundamental  examples  of  measure 
spaces.  The  idea  of  extension  of  a  measure  is  discussed,  basing  on  the  Caratheodory 
theorem  (of  which  the  proof  is  given  in  Appendix  1).  Then  the  general  elementary 
properties  of  probability  are  discussed  in  detail  in  Sect.  2.2.  Conditional  probability 
given  an  event  is  introduced  along  with  the  concept  of  independence  in  Sect.  2.3. 
The  chapter  concludes  with  Sect.  2.4  presenting  the  total  probability  formula  and 
the  Bayes  formula,  the  former  illustrated  by  an  example  leading  to  the  introduction 
of  the  Poisson  process. 

2.1  The  Axioms  of  Probability  Theory.  A  Probability  Space 

So  far  we  have  been  considering  problems  in  which  the  set  of  outcomes  had  at  most 
countably  many  elements.  In  such  a  case  we  defined  the  probability  P(A)  using  the 
probabilities  P(&>)  of  elementary  outcomes  go.  It  proved  to  be  a  function  defined  on 
all  the  subsets  A  of  the  space  Q  of  elementary  events  having  the  following  proper¬ 
ties: 


(1)  P(A)  >  0. 

(2)  P(tf)  =  l. 

(3)  For  disjoint  events  Ai ,  A2, . . . 


However,  as  we  have  already  noted,  one  can  easily  imagine  a  problem  in  which 
the  set  of  all  outcomes  is  uncountable.  For  example,  choosing  a  point  at  random 
from  the  segment  [t\ ,  £2]  (say,  in  an  experiment  involving  measurement  of  tempera¬ 
ture)  has  a  continuum  of  outcomes,  for  any  point  of  the  segment  could  be  the  result 
of  the  experiment.  While  in  experiments  with  finite  or  countable  sets  of  outcomes 
any  collection  of  outcomes  was  an  event,  this  is  not  the  case  in  this  example.  We  will 
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encounter  serious  difficulties  if  we  treat  any  subset  of  the  segment  as  an  event.  Here 
one  needs  to  select  a  special  class  of  subsets  which  will  be  treated  as  events. 

Let  the  space  of  elementary  events  Q  be  an  arbitrary  set,  and  /l  be  a  system  of 
subsets  of  Q . 

Definition  2.1.1  .A  is  called  an  algebra  if  the  following  conditions  are  met: 

Al.  Q  e  A. 

A2.  If  A  e  A  and  B  e  A,  then 

AU  B  e  A,  An  B  eA. 

A3.  If  A  eA  then  A  eA. 

It  is  not  hard  to  see  that  in  condition  A2  it  suffices  to  require  that  only  one  of  the 
given  relations  holds.  The  second  relation  will  be  satisfied  automatically  since 


A  n  B  =  AU  B . 

An  algebra  A  is  sometimes  called  a  ring  since  there  are  two  operations  defined 
on  A  (addition  and  multiplication)  which  do  not  lead  outside  of  A.  An  algebra  A  is 
a  ring  with  identity ,  for  Q  e  A  and  A Q  =  Q  A  =  A  for  any  A  e  A. 

Definition  2.1.2  A  class  of  sets  $  is  called  a  sigma-algebra  (a  -algebra,  or  cr -ring, 
or  Bore l  field  of  events)  if  property  A2  is  satisfied  for  any  sequences  of  sets: 

A2'.  If  {An}  is  a  sequence  of  sets  from  $,  then 

oo  oo 

|^J  An  e  $,  j^|  Aft  e 

n=\  n = 1 


Here,  as  was  the  case  for  A2,  it  suffices  to  require  that  only  one  of  the  two  rela¬ 
tions  be  satisfied.  The  second  relation  will  follow  from  the  equality 


Aft  —  Aft . 

n  n 

Thus  an  algebra  is  a  class  of  sets  which  is  closed  under  definite  number  of  opera¬ 
tions  of  taking  complements,  unions  and  intersections;  a  a -algebra  is  a  class  of  sets 
which  is  closed  under  a  countable  number  of  such  operations. 

Given  a  set  Q  and  an  algebra  or  o  -algebra  $  of  its  subsets,  one  says  that  we  are 
given  a  measurable  space  {£2,$}. 

For  the  segment  [0,  1],  all  the  sets  consisting  of  a  finite  number  of  segments  or 
intervals  form  an  algebra,  but  not  a  o -algebra. 
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Consider  all  the  a -algebras  on  [0,  1]  containing  all  intervals  from  that  segment 
(there  is  at  least  one  such  a -algebra,  for  the  collection  of  all  the  subsets  of  a  given 
set  clearly  forms  a  a -algebra).  It  is  easy  to  see  that  the  intersection  of  all  such  a- 
algebras  (i.e.  the  collection  of  all  the  sets  which  belong  simultaneously  to  all  the  a- 
algebras)  is  again  a  o -algebra.  It  is  the  smallest  a -algebra  containing  all  intervals 
and  is  called  the  Borel  o -algebra.  Roughly  speaking,  the  Borel  o -algebra  could  be 
thought  of  as  the  collection  of  sets  obtained  from  intervals  by  taking  countably  many 
unions,  intersections  and  complements.  This  is  a  rather  rich  class  of  sets  which  is 
certainly  sufficient  for  any  practical  purposes.  The  elements  of  the  Borel  o -algebra 
are  called  Borel  sets.  Everything  we  have  said  in  this  paragraph  equally  applies  to 
systems  of  subsets  of  the  whole  real  line. 

Along  with  the  intervals  ( a,  b ),  the  one-point  sets  {a}  and  sets  of  the  form  (a,  b], 
[a,  b]  and  [a,  b)  (in  which  a  and  b  can  take  infinite  values)  are  also  Borel  sets.  This 
assertion  follows,  for  example,  from  the  representations  of  the  form 

oo  oo 

{a}  =  n  (a  —  l/n,  a  +  1/ft),  (a,  b]  =  (^(a,  b  +  l/n). 

n =1  n= 1 

Thus  all  countable  sets  and  countable  unions  of  intervals  and  segments  are  also 
Borel  sets. 

For  a  given  class  T>  of  subsets  of  Q ,  one  can  again  consider  the  intersection  of 
all  cr  -algebras  containing  T>  and  obtain  in  this  way  the  smallest  o  -algebra  contain¬ 
ing  rB. 

Definition  2.1.3  The  smallest  cr -algebra  containing  T>  is  called  the  cr -algebra  gen¬ 
erated  by  T>  and  is  denoted  by  cr  (CB) . 

In  this  terminology,  the  Borel  cr  -algebra  in  the  n  -dimensional  Euclidean  space 
W1  is  the  o -algebra  generated  by  rectangles  or  balls.  If  Q  is  countable,  then  the 
cr  -algebra  generated  by  the  elements  co  e  £2  clearly  coincides  with  the  cr  -algebra  of 
all  subsets  of  Q . 

As  an  exercise,  we  suggest  the  reader  to  describe  the  algebra  and  the  cr  -algebra 
of  sets  in  Q  =  [0,  1]  generated  by:  (a)  the  intervals  (0,  1/3)  and  (1/3,  1);  (b)  the 
semi-open  intervals  (a,  1],  0  <  a  <  1;  and  (c)  individual  points. 

To  formalise  a  probabilistic  problem,  one  has  to  find  an  appropriate  measurable 
space  (Q,  #)  for  the  corresponding  experiment.  The  symbol  Q  denotes  the  set  of 
elementary  outcomes  of  the  experiment,  while  the  algebra  or  a -algebra  $  specifies  a 
class  of  events.  All  the  remaining  subsets  of  Q  which  are  not  elements  of  $  are  not 
events.  Rather  often  it  is  convenient  to  define  the  class  of  events  $  as  the  cr  -algebra 
generated  by  a  certain  algebra  A. 

Selecting  a  specific  algebra  or  a -algebra  $  depends,  on  the  one  hand,  on  the 
nature  of  the  problem  in  question  and,  on  the  other  hand,  on  that  of  the  set  Q .  As 
we  will  see,  one  cannot  always  define  probability  in  such  a  way  that  it  would  make 
sense  for  any  subset  of  Q . 
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We  have  already  noted  in  Chap.  1  that,  in  probability  theory,  one  uses,  along 
with  the  usual  set  theory  terminology,  a  somewhat  different  terminology  related  to 
the  fact  that  the  subsets  of  T2  (belonging  to  #)  are  interpreted  as  events.  The  set  £2 
itself  is  often  called  the  certain  event.  By  axioms  A1  and  A2,  the  empty  set  0  also 
belongs  to  it  is  called  the  impossible  event.  The  event  A  is  called  the  complement 
event  or  simply  the  complement  of  A.  If  A  n  B  =  0,  then  the  events  A  and  B  are 
called  mutually  exclusive  or  disjoint. 

Now  it  remains  to  introduce  the  notion  of  probability.  Consider  a  space  Q  and  a 
system  A  of  its  subsets  which  forms  an  algebra  of  events. 

Definition  2.1.4  A  probability  on  (Q,A)  is  a  real- valued  function  defined  on  the 
sets  from  A  and  having  the  following  properties: 

PI.  P(A)  >  0  for  any  AeA. 

P2.  P(^)  =  1. 

P3.  If  a  sequence  of  events  {An}  is  such  that  AiAj  =  0  for  i  ^  j  and  (J  An  e  A , 
then 


These  properties  can  be  considered  as  an  axiomatic  definition  of  probability. 

An  equivalent  to  axiom  P3  is  the  requirement  of  additivity  (2.1.1)  for  finite  col¬ 
lections  of  events  Aj  plus  the  following  continuity  axiom. 

P3r.  Let  {Bn}  be  a  sequence  of  events  such  that  Bn+\  C  Bn  and  H^=i  Bn  =  B  e  A. 
Then  P (Bn)  ->  P(5)  as  n  — >►  oc. 

Proof  of  the  equivalence  Assume  P3  is  satisfied  and  let  Bn+\  C  Bn ,  [~\n  Bn  = 
B  e  A.  Then  the  sequence  of  the  events  B ,  Ck  =  BkBk+u  k  =  1,2,...,  consists 
of  disjoint  events  and  Bn  =  B  +  U£/?  Ck  for  any  n.  Now  making  use  of  property 
P3  we  see  that  the  series  P(#i)  =  P(#)  +  P(Q;)  is  convergent,  which  means 

that 

oo 

p (Bn)  =  p (B)  +  J2  P (Ck)  P (B) 

k=n 

as  n  — >  oo.  This  is  just  the  property  P32 

Conversely,  if  An  is  a  sequence  of  disjoint  events,  then 

p(u^)=p(u^)+p(  U  A) 

\k=  1  /  \A:=1  /  \k=n+ 1  / 


and  one  has 


oo  n  In' 

y>(A*)=  lim  V]P(At)=  lim  P  M  Ak 

k= 1  k=  1  \k= 1 


2.1  The  Axioms  of  Probability  Theory.  A  Probability  Space 


17 


lim 

ft— >00 


OO 


OO 


p(  LM-p  U  ^ 

k=i  /  \fc=«+i 


OO 


=p  UA* 


k= 1 


The  last  equality  follows  from  P3' 


□ 


Definition  2.1.5  A  triple  (£2,  A,  P)  is  called  a  wide-sense  probability  space.  If  an 
algebra  3  is  a  a-algebra  (3  =  <7(3)),  then  condition  An  e  3  in  axiom  P3  (for 
a  probability  on  {£2,$))  will  be  automatically  satisfied. 

Definition  2.1.6  A  triple  {£2,3,  P),  where  3  is  a  cr-algebra,  is  called  a  probability 
space. 


A  probability  P  on  {£2,3)  is  also  sometimes  called  a  probability  distribution  on 
£2  or  just  a  distribution  on  £2  (on  {£2,  3))- 

Thus  defining  a  probability  space  means  defining  a  countably  additive  nonneg¬ 
ative  measure  on  a  measurable  space  such  that  the  measure  of  £2  is  equal  to  one. 
In  this  form  the  axiomatics  of  Probability  Theory  was  formulated  by  A.N.  Kol¬ 
mogorov.  The  system  of  axioms  we  introduced  is  incomplete  and  consistent. 

Constructing  a  probability  space  {£2,  3,  P)  is  the  basic  stage  in  creating  a  math¬ 
ematical  model  (formalisation)  of  an  experiment. 

Discussions  on  what  should  one  understand  by  probability  have  a  long  history 
and  are  related  to  the  desire  to  connect  the  definition  of  probability  with  its  “phys¬ 
ical”  nature.  However,  because  of  the  complexity  of  the  latter,  such  attempts  have 
always  encountered  difficulties  not  only  of  mathematical,  but  also  of  philosophical 
character  (see  the  Introduction).  The  most  important  stages  in  this  discussion  are  re¬ 
lated  to  the  names  of  Borel,  von  Mises,  Bernstein  and  Kolmogorov.  The  emergence 
of  Kolmogorov’s  axiomatics  separated,  in  a  sense,  the  mathematical  aspect  of  the 
problem  from  all  the  rest.  With  this  approach,  the  “physical  interpretation”  of  the 
notion  of  probability  appears  in  the  form  of  a  theorem  (the  strong  law  of  large  num¬ 
bers,  see  Chaps.  5  and  7),  by  virtue  of  which  the  relative  frequency  of  the  occurrence 
of  a  certain  event  in  an  increasingly  long  series  of  independent  trials  approaches  (in 
a  strictly  defined  sense)  the  probability  of  this  event. 

We  now  consider  examples  of  the  most  commonly  used  measurable  and  proba¬ 
bility  spaces. 

1.  Discrete  measurable  spaces.  These  are  spaces  {£2,3)  where  £2  is  a  finite  or 
countably  infinite  collection  of  elements,  and  the  a -algebra  3  usually  consists  of 
all  the  subsets  of  £2 .  Discrete  probability  spaces  constructed  on  discrete  measurable 
spaces  were  studied,  with  concrete  examples,  in  Chap.  1. 

2.  The  measurable  space  (R,  03),  where  R  is  the  real  line(or  a  part  of  it)  and  03 
is  the  a -algebra  of  Borel  sets.  The  necessity  of  considering  such  spaces  arises  in 
situations  where  the  results  of  observations  of  interest  may  assume  any  values  in  R. 

Example  2.1.1  Consider  an  experiment  consisting  of  choosing  a  point  “at  random” 
from  the  interval  [0,  1].  By  this  we  will  understand  the  following.  The  set  of  elemen¬ 
tary  outcomes  £2  is  the  interval  [0,  1].  The  o -algebra  3  will  be  taken  to  be  the  class 
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of  subsets  B  for  which  the  notion  of  length  (Lebesgue  measure)  ti{B)  is  defined — 
for  example,  the  a -algebra  33  of  Borel  measurable  sets.  To  “conduct  a  trial”  means 
to  choose  a  point  co  e  T2  =  [0,  1],  the  probability  of  the  event  co  e  B  being  fi(B).  All 
the  axioms  are  clearly  satisfied  for  the  probability  space  ([0,  1],  33,  /x).  We  obtain 
the  so-called  uniform  distribution  on  [0,  1]. 

Why  did  we  take  the  o  -algebra  of  Borel  sets  03  to  be  our  $  in  this  example?  If  we 
considered  on  Q  =  [0,  1]  the  a -algebra  generated  by  “individual”  points  of  the  in¬ 
terval,  we  would  get  the  sets  of  which  the  Lebesgue  measure  is  either  0  or  1 .  In  other 
words,  the  obtained  sets  would  be  either  very  “dense”  or  very  “thin”  (countable),  so 
that  the  intervals  (a,  b)  for  0  <  b  —  a  <  1  do  not  belong  to  this  o -algebra. 

On  the  other  hand,  if  we  considered  on  T2  =  [0,  1]  the  a -algebra  of  all  subsets  of 
T2 ,  it  would  be  impossible  to  define  a  probability  measure  on  it  in  such  a  way  that 
P ([a,  b])  =  b  —  a  (i.e.  to  get  the  uniform  distribution). 

Turning  back  to  the  uniform  distribution  P  on  £2  =  [0,  1],  it  is  easy  to  see  that 
it  is  impossible  to  define  this  distribution  using  the  same  approach  as  we  used  to 
define  a  probability  on  a  discrete  space  of  elementary  events  (i.e.  by  defining  the 
probabilities  of  elementary  outcomes  co).  Since  in  this  example  the  cos  are  individual 
points  from  [0,  1],  we  clearly  have  P(<x>)  =  0  for  any  co. 

3.  The  measurable  space  (R",  33")  is  used  in  the  cases  when  observations  are 
vectors.  Here  M"  is  the  -dimensional  Euclidean  space(M"  =  Mi  x  •  •  •  x  R",  where 
Ri, . . . ,  Rw  are  n  copies  of  the  real  line),  03"  is  the  o -algebra  of  Borel  sets  in  M", 
i.e.  the  a -algebra  generated  by  the  sets  B  =  B\  x  •  •  •  x  Bn,  where  Bi  c  R /  are  Borel 
sets  on  the  line.  Instead  of  M"  we  could  also  consider  some  measurable  part  Q  e  03" 
(for  example  a  cube  or  ball),  and  instead  of  03"  the  restriction  of  03"  onto  Q .  Thus, 
similarly  to  the  last  example  one  can  construct  a  probability  space  for  choosing  a 
point  at  random  from  the  cube  T2  =  [0,  1]".  We  put  here  P (co  e  B)  =  /a (B),  where 
/i(B)  is  the  Lebesgue  measure  (volume)  of  the  set  B.  Instead  of  the  cube  [0,  1]"  we 
could  consider  any  other  cube,  for  example  [a,  b]n ,  but  in  this  case  we  would  have 
to  put 


P(<x>  e  B)  =  p(B)  /  p(T2)  =  p(B)/(b  —  a)n . 

This  is  the  uniform  distribution  on  a  cube. 

In  Probability  Theory  one  also  needs  to  deal  with  more  complex  probability 
spaces.  What  to  do  if  the  result  of  the  experiment  is  an  infinite  random  sequence?  In 
this  case  the  space  (R°°,  33°°)  is  often  the  most  appropriate  one. 

4.  The  measurable  space  (R°°,  33°°),  where 

oo 

r°°  =  jq  Rj 
j= i 


!See  e.g.  [28],  p.  80. 
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is  the  space  of  all  sequences  (jq,  X2, . . .)  (the  direct  product  of  the  spaces  My),  and 
93°°  the  a -algebra  generated  by  the  sets  of  the  form 


k<N 

for  any  N,  j\, ... ,  jjy ,  where  93  y  is  the  <r -algebra  of  Borel  sets  from  My . 

5.  If  an  experiment  results,  say,  in  a  continuous  function  on  the  interval  [ a ,  /?] 
(a  trajectory  of  a  moving  particle,  a  cardiogram  of  a  patient,  etc.),  then  the  probabil¬ 
ity  spaces  considered  above  turn  out  to  be  inappropriate.  In  such  a  case  one  should 
take  Q  to  be  the  space  C(a,b )  of  all  continuous  functions  on  [a,  b]  or  the  space 
of  all  functions  on  [a,b].  The  problem  of  choosing  a  suitable  a-algebra  here 
becomes  somewhat  more  complicated  and  we  will  discuss  it  later  in  Chap.  18. 

Now  let  us  return  to  the  definition  of  a  probability  space. 

Let  a  triple  {£2,  A,  P)  be  a  wide-sense  probability  space  (A  is  an  algebra).  As 
we  have  already  seen,  to  each  algebra  A  there  corresponds  a  a -algebra  $  =  a  (A) 
generated  by  A.  The  following  question  is  of  substantial  interest:  does  the  proba¬ 
bility  measure  P  on  A  define  a  measure  on  $  =  a  (A)  ?  And  if  so,  does  it  define 
it  in  a  unique  way?  In  other  words,  to  construct  a  probability  space  {£2,  A,  P),  is 
it  sufficient  to  define  the  probability  just  on  some  algebra  A  generating  $  (i.e.  to 
construct  a  wide-sense  probability  space  {Q,A,  P),  where  a  (A)  =  #)?  An  answer 
to  this  important  question  is  given  by  the  Caratheodory  theorem. 

The  measure  extension  theorem  Let  (£2,  A,  P)  be  a  wide-sense  probability  space. 
Then  there  exists  a  unique  probability  measure  Q  defined  on  $  =  a  (A)  such  that 

Q(A)=P(A)  for  all  At  A. 

Corollary  2.1.1  Any  wide-sense  probability  space  (T2,  A,  P)  automatically  defines 
a  probability  space  (T2 ,  P)  with  $  =  o  (A). 

We  will  make  extensive  use  of  this  fact  in  what  follows.  In  particular,  it  implies 
that  to  define  a  probability  measure  on  the  measurable  space  (M,  93),  it  suffices  to 
define  the  probability  on  intervals. 

The  proof  of  the  Caratheodory  theorem  is  given  in  Appendix  1 . 

In  conclusion  of  this  section  we  will  make  a  general  comment.  Mathematics  dif¬ 
fers  qualitatively  from  such  sciences  as  physics,  chemistry,  etc.  in  that  it  does  not 
always  base  its  conclusions  on  empirical  data  with  the  help  of  which  a  naturalist 
tries  to  answer  his  questions.  Mathematics  develops  in  the  framework  of  an  initial 
construction  or  system  of  axioms  with  which  one  describes  an  object  under  study. 
Thus  mathematics  and,  in  particular,  Probability  Theory,  studies  the  nature  of  the 
phenomena  around  us  in  a  methodologically  different  way:  one  studies  not  the  phe¬ 
nomena  themselves,  but  rather  the  models  of  these  phenomena  that  have  been  cre¬ 
ated  based  on  human  experience.  The  value  of  a  particular  model  is  determined  by 
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the  agreement  of  the  conclusions  of  the  theory  with  our  observations  and  therefore 
depends  on  the  choice  of  the  axioms  characterising  the  object. 

In  this  sense  axioms  PI,  P2,  and  the  additivity  of  probability  look  indisputable 
and  natural  (see  the  remarks  in  the  Introduction  on  desirable  properties  of  probabil¬ 
ity).  Countable  additivity  of  probability  and  the  property  A2r  of  a -algebras  are  more 
delicate  and  less  easy  to  intuit  (as  incidentally  are  a  lot  of  other  things  related  to  the 
notion  of  infinity).  Introducing  the  last  two  properties  was  essentially  brought  about 
by  the  possibility  of  constructing  a  meaningful  mathematical  theory.  Numerous  ap¬ 
plications  of  Probability  Theory  developed  from  the  system  of  axioms  formulated 
in  the  present  section  demonstrate  its  high  efficiency  and  purposefulness. 


2.2  Properties  of  Probability 

1.  P(0)  =  0.  This  follows  from  the  equality  0  +  Q  =  Q  and  properties  P2  and  P3 
of  probability. 

2.  P(A)  =  1  —  P(A),  since  A  +  A  =  Q  and  A  Cl  A  =  0. 

3.  If  A  C  B,  then  P(A)  <  P(5).  This  follows  from  the  relation  P(A)  +  P(AZ?)  = 

P  (B). 

4.  P(A)  <  1  (by  properties  3  and  P2). 

5.  P(A  U  B)  =  P(A)  +  P(5)  -  P(Afl),  since  AU£  =  A  +  (£-  AB)  and  P  (B  - 
AB)  =  F(B)  -P(Afl). 

6.  P(AUS)  <P(A)+P(/?)  follows  from  the  previous  property. 

7.  The  formula 

(n  \  n 

=yp(4)-yp(4A,) 

j  =  1  /  k=  1  k<l 

+  P(AM/A,n)----  +  (-l)n-1P(Ai...A„) 

k<l<m 

has  already  been  proved  and  used  for  discrete  spaces  Q .  Here  the  reader  can  prove 
it  in  exactly  the  same  way,  using  induction  and  property  5. 

Denote  the  sums  on  the  right  hand  side  of  the  last  formula  by  Zi,  Z2, . . . ,  Zw, 
respectively.  Then  statement  7  for  the  event  Bn  =  U  j=i  ^  j  can  be  rewritten  as 

P(£„)  =  E”=i(-i0%- 

8.  An  important  addition  to  property  7  is  that  the  sequence  Ey"=i(— 
approximates  P  (Bn)  by  turns  from  above  and  from  below  as  k  grows ,  i.e. 

2k-l 

P(Bn)-J2(-W~lZj<0, 

7=1 

2k 

p (Bn)  -  ^](-l )j-lZj  >0,  k=  1,2,... 

7  =  1 


(2.2.1) 
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This  property  can  also  be  proved  by  induction  on  n.  For  n  =  2  this  property  is 
ascertained  in  5.  Let  (2.2.1)  be  valid  for  any  events  A\ , . . . ,  An-\  (i.e.  for  any  Bn-\). 
Then  by  5  we  have 


P {Bn)  =  P(B„-1  U  An)  =  P(B„_i)  +  P (An)  -  P 


where,  in  view  of  (2.2.1)  for  k  =  1, 


n = 1  it  —  1  it  —  1 

P(Ay)  -  £P(A/A,-)  <  P(B„_!)  <  £> (A;), 

.7  =  1  i<7  7  =  1 


72  —  1 

<£>(A;Ab). 

7  =  1 


Hence,  for  Bn  =  Bn-\  U  An,  we  get 


n 


P (S„)  <  £>(A;'), 

/=! 

P(B„)  =  P(B„_i)  +  P(A„)  -  P(B„_i An) 

n  72  —  1  72—1 

>  X>(A7)  -  ^P(A,A,)  -  J^PCA^A,,) 


72 


72 


7  =  1 


1  <7 


7=1 


^P(A„)-^P(A/Aj). 

7  =  1 


;  <7 


This  proves  (2.2.1)  for  k  =  1.  For  k  =  2,  3, . . .  the  proof  is  similar. 

9.  If  is  a  monotonically  increasing  sequence  of  sets  (i.e.  An  C  Aw+i)  and 
A  =  U,~  l  An,  then 

P(A)  =  lim  P(A„).  (2.2.2) 

72 — >-  OO 

This  is  a  different  form  of  the  continuity  axiom  equivalent  to  P3r. 

Indeed,  introducing  the  sets  Bn  =  A  —  An,  we  get  Bn+ \  C  Bn  and  Pl/Tli  Bn  = 
Therefore,  by  the  continuity  axiom, 


P(A  —  Aft)  =  P(A)  —  P(Aft)  — >  0 


as  /i  — >  oo.  The  converse  assertion  that  (2.2.2)  implies  the  continuity  axiom  can  be 
obtained  in  a  similar  way.  □ 


2.3  Conditional  Probability.  Independence  of  Events  and  Trials 

We  will  start  with  examples.  Let  an  experiment  consist  of  three  tosses  of  a  fair 
coin.  The  probability  that  heads  shows  up  only  once,  i.e.  that  one  of  the  elementary 


22 


2  An  Arbitrary  Space  of  Elementary  Events 


events  htt ,  tht ,  or  tth  occurs,  is  equal  in  the  classical  scheme  to  3/8.  Denote 
this  event  by  A.  Now  assume  that  we  know  in  addition  that  the  event  B  = 
{the  number  of  heads  is  odd}  has  occurred. 

What  is  the  probability  of  the  event  A  given  this  additional  information?  The 
event  B  consists  of  four  elementary  outcomes.  The  event  A  is  constituted  by  three 
outcomes  from  the  event  B.  In  the  framework  of  the  classical  scheme,  it  is  natural 
to  define  the  new  probability  of  the  event  A  to  be  3/4. 

Consider  a  more  general  example.  Let  a  classical  scheme  with  n  outcomes  be 
given.  An  event  A  consists  of  r  outcomes,  an  event  B  of  m  outcomes,  and  let  the 
event  A B  have  k  outcomes.  Similarly  to  the  previous  example,  it  is  natural  to  define 
the  probability  of  the  event  A  given  the  event  B  has  occurred  as 

k  k/n 
P(A|5)  =  —  =  — . 

m  m/n 

The  ratio  is  equal  to  P(A??)/P(/?),  for 

k  m 

P(A|  B)  =  — ,  P = 
n  n 

Now  we  can  give  a  general  definition. 


Definition  2.3.1  Let  {k2,$,  P)  be  a  probability  space  and  A  and  B  be  arbitrary 
events.  If  P(£)  >  0,  the  conditional  probability  of  the  event  A  given  B  has  occurred 
is  denoted  by  P(A|Z?)  and  is  defined  by 


P(A|5)  := 


P  (AB) 
P (B)  ' 


Definition  2.3.2  Events  A  and  B  are  called  independent  if 


P(AB)  =P(A)P(£). 


Below  we  list  several  properties  of  independent  events. 

1.  If  P (B)  >  0,  then  the  independence  of  A  and  B  is  equivalent  to  the  equality 

P(A\B)  =  P(A). 


The  proof  is  obvious. 

2.  If  A  and  B  are  independent,  then  A  and  B  are  also  independent. 
Indeed, 


P(AB)  =  P(B  -  AB) 

=  P(B)  -  P (AB)  =  -  P(A))  =  P(A)P(5). 

3.  Let  the  events  A  and  B\  and  the  events  A  and  B 2  each  be  independent,  and 
assume  B\B2  =  0.  Then  the  events  A  and  B\  +  B2  are  independent. 
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Fig.  2.1  Illustration  to 
Example  2.3.2:  the  dashed 
rectangles  represent  the 
events  A  and  B 


The  property  is  proved  by  the  following  chain  of  equalities: 

P(A(Bi  +  B2))  =  P(ABi  +  AB2 )  =  P(ABi)  +  P  (AB2) 

=  P(A)(P(Bi)  +  P  (B2))  =  P(A)P(fl!  +  B2). 

As  we  will  see  below,  the  requirement  B\B2  =  0  is  essential  here. 

Example  2.3.1  Let  event  A  mean  that  heads  shows  up  in  the  first  of  two  tosses  of  a 
fair  coin,  and  event  B  that  tails  shows  up  in  the  second  toss.  The  probability  of  each 
of  these  events  is  1/2.  The  probability  of  the  intersection  A B  is 

P(AB)  =  ^  =  y^=P(A)P(B). 

Therefore  the  events  A  and  B  are  independent. 

Example  2.3.2  Consider  the  uniform  distribution  on  the  square  [0,  l]2  (see  Sect.  2.1). 
Let  A  be  the  event  that  a  point  chosen  at  random  is  in  the  region  on  the  right  of  an 
abscissa  a  and  B  the  event  that  the  point  is  in  the  region  above  an  ordinate  b. 

Both  regions  are  hatched  in  Fig.  2.1.  The  event  A B  is  squared  in  the  figure. 
Clearly,  P (AB)  =P(A)P(5),  and  hence  the  events  A  and  B  are  independent. 

It  is  also  easy  to  verify  that  if  B  is  the  event  that  the  chosen  point  is  inside  the 
triangle  ECD  (see  Fig.  2.1),  then  the  events  A  and  B  will  already  be  dependent. 

Definition  2.3.3  Events  B\,  B2, . . . ,  Bn  are  jointly  independent  if,  for  any  1  <  i\  < 
z’2  <  •  •  •  <  ir  <  n,  r  =  2,  3, . . . ,  n. 


Pairwise  independence  is  not  sufficient  for  joint  independence  of  n  events,  as  one 
can  see  from  the  following  example. 
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Example  2.3.3  (Bernstein’s  example)  Consider  the  following  experiment.  We  roll  a 
symmetric  tetrahedron  of  which  three  faces  are  painted  red,  blue  and  green  respec¬ 
tively,  and  the  fourth  is  painted  in  all  three  colours.  Event  R  means  that  when  the 
tetrahedron  stops,  the  bottom  face  has  the  red  colour  on  it,  event  B  that  it  has  the 
blue  colour,  and  G  the  green.  Since  each  of  the  three  colours  is  present  on  two  faces, 
P (R)  =  P (B)  =  P(G)  =  1/2.  For  any  two  of  the  introduced  events,  the  probability 
of  the  intersection  is  1  /4,  since  any  two  colours  are  present  on  one  face  only.  Since 
f  =  f  x  this  implies  the  pairwise  independence  of  all  three  events.  However, 

1 

P (RGB)  =  -  ^  P(fl)P(fl)P(G)  =  1/8.  □ 

Now  it  is  easy  to  construct  an  example  in  which  property  3  of  independent  events 
does  not  hold  when  B  \  B^  0. 

An  example  of  a  sequence  of  jointly  independent  events  is  given  by  the  series  of 
outcomes  of  trials  in  the  Bernoulli  scheme. 

If  we  assume  that  each  outcome  was  obtained  as  a  result  of  a  separate  trial ,  then 
we  will  find  that  any  event  related  to  a  fixed  trial  will  be  independent  of  any  event 
related  to  other  trials.  In  such  cases  one  speaks  of  a  sequence  of  independent  trials. 

To  give  a  general  definition,  consider  two  arbitrary  experiments  G\  and  G2  and 
denote  by  (£2i ,  #1 ,  Pi)  an(l  (^2,  $2,  P2)  the  respective  probability  spaces.  Consider 
also  the  “compound”  experiment  G  with  the  probability  space  (£?,#,  P),  where 
£2  =  £2 \  x  £2 2  is  the  direct  product  of  the  spaces  E2\  and  Q 2 ,  and  the  a -algebra  $  is 
generated  by  the  direct  product  #1  x  $2  (he.  by  the  events  B  =  B\  x  B2 ,  B\  e  #1, 
B2  e  #2). 

Definition  2.3.4  We  will  say  that  the  trials  G\  and  G2  are  independent  if,  for  any 
B  =  B\  x  £2,  B\  e  #1,  B2  e  £2  one  has 

P(£)  =?l(Bl)F2(B2)  =  P(Bi  x  tf2)P(^i  x  B2). 

Independence  of  n  trials  G  \ , . . . ,  Gn  is  defined  in  a  similar  way,  using  the  equal¬ 
ity 

P(5)  =  Pi  (5i)...  ?*(£„), 

where  B  =  B\  x  •••  x  ft  G&,  and  (£?&,  P^)  is  the  probability  space  corre¬ 

sponding  to  the  experiment  Gk,  k  =  1, . . .  ,n. 

In  the  Bernoulli  scheme,  the  probability  of  any  sequence  of  outcomes  consisting 
of  r  zeros  and  ones  and  containing  k  ones  is  equal  to  pk(  1  —  p)r~k .  Therefore  the 
Bernoulli  scheme  may  be  considered  as  a  result  of  r  independent  trials  in  each  of 
which  one  has  1  (success)  with  probability  p  and  0  (failure)  with  probability  1  —  p. 
Thus,  the  probability  of  k  successes  in  r  independent  trials  equals  (£) pk(  1  —  p)r~k . 

The  following  assertion,  which  is  in  a  sense  converse  to  the  last  one,  is  also 
true:  any  sequence  of  identical  independent  trials  with  two  outcomes  makes  up  a 
Bernoulli  scheme. 

In  Chap.  3  several  remarks  will  be  given  on  the  relationship  between  the  notions 
of  independence  we  introduced  here  and  the  common  notion  of  causality. 


2.4  The  Total  Probability  Formula.  The  Bayes  Formula 
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Let  A  be  an  event  and  B\,  B2, . . . ,  Bn  be  mutually  exclusive  events  having  positive 
probabilities  such  that 

n 

Ac\Jbj. 

j  =  1 

The  sequence  of  events  B\,  B2, ...  can  be  infinite,  in  which  case  we  put  n  =  00.  The 
following  total  probability  formula  holds  true: 

n 

P(fl)  =  £P(B;)P(A|B/)- 

7= 1 

Proof  It  follows  from  the  assumptions  that 

n 

a  =  \JbjA. 

7  =  1 

Moreover,  the  events  AB 1,  AB2, . . . ,  ABn  are  disjoint,  and  hence 

n  n 

P(A)  =  Vp  (ABj)  =  Y/P(Bj)P(A\Bj). 

j= l  7= l  □ 


Example  2.4.1  In  experiments  with  colliding  electron-positron  beams,  the  probabil¬ 
ity  that  during  a  time  unit  there  will  occur  j  collisions  leading  to  the  birth  of  new 
elementary  particles  is  equal  to 

pj  =  — n— ,  7=0,1,..., 

7! 

where  A.  is  a  positive  parameter  (this  is  the  so-called  Poisson  distribution,  to  be  con¬ 
sidered  in  more  detail  in  Chaps.  3,  5  and  19).  In  each  collision,  different  groups  of 
elementary  particles  can  appear  as  a  result  of  the  interaction,  and  the  probability  of 
each  group  is  fixed  and  does  not  depend  on  the  outcomes  of  other  collisions.  Con¬ 
sider  one  such  group,  consisting  of  two  /x -mesons,  and  denote  by  p  the  probability 
of  its  appearance  in  a  collision.  What  is  the  probability  of  the  event  A &  that,  during 
a  time  unit,  k  pairs  of  p -mesons  will  be  born? 

Assume  that  the  event  Bj  that  there  were  j  collisions  during  the  time  unit  has 
occurred.  Given  this  condition,  we  will  have  a  sequence  of  j  independent  trials,  and 
the  probability  of  having  k  pairs  of  /x -mesons  will  be  ({)pk(  1  —  p)^~k  •  Therefore 
by  the  total  probability  formula, 


00 


p(Ak) = yyp{B  j)P(Ak\B  j) = 


00  e~x\i 


7'! 


j=k 


j=k 


j\  k\(j  —  k)\ 


Pk(i-Py-k 


26 


2  An  Arbitrary  Space  of  Elementary  Events 


-X  i  k 


oo 


rtf  ^  (mi  -  p))J 


k\ 


E 

J=0 


7 


jfc! 


Thus  we  again  obtain  a  Poisson  distribution,  but  this  time  with  parameter  kp. 

The  solution  above  was  not  formalised.  A  formal  solution  would  first  of  all 
require  the  construction  of  a  probability  space.  The  space  turns  out  to  be  rather 
complex  in  this  example.  Denote  by  Q j  the  space  of  elementary  outcomes  in  the 
Bernoulli  scheme  corresponding  to  j  trials,  and  let  coj  denote  an  element  of  Q j . 
Then  one  could  take  Q  to  be  the  collection  of  all  pairs  {O',  <w/)}yL 0,  where  the 
number  j  indicates  the  number  of  collisions,  and  coj  is  a  sequence  of  “successes” 
and  “failures”  of  length  j  (“success”  stands  for  the  birth  of  two  /x- mesons).  If  coj 
contains  k  “successes”,  one  has  to  put 

P(U,COj))=Pjpk(l-p)j-k. 

To  get  P(Afc),  it  remains  to  sum  up  these  probabilities  over  all  coj  containing  k 
successes  and  all  j  >  k  (the  idea  of  the  total  probability  formula  is  used  here  tacitly 
when  splitting  A k  into  the  events  (j,  coj)). 

The  fact  that  the  number  of  collisions  is  described  here  by  a  Poisson  distribution 
could  be  understood  from  the  following  circumstances  related  to  the  nature  of  the 
physical  process.  Let  Bj(t,  u)  be  the  event  that  there  were  j  collisions  during  the 
time  interval  [t,  t  +  u).  Then  it  turns  out  that: 

(a)  the  pairs  of  events  Bj(v,  t)  and  B^iv  +  t,  u)  related  to  non- overlapping  time 
intervals  are  independent  for  all  v,  t,  u,  j,  and  k\ 

(b)  for  small  A  the  probability  of  a  collision  during  the  time  A  is  proportional  to  A : 

V(Bx(t,Aj)=XA+o(A), 
and,  moreover,  P (Bk(t,  A))  =  o(A)  for  k  >  2. 

Again  using  the  total  probability  formula  with  the  hypotheses  Bj(v,t),  we  obtain 
for  the  probabilities  pk (t )  =  P(/4 (v,  t))  the  following  relations: 


h 

pk(t  +  A)  =  '^pj(t)P(Bk{v,t  +  A) 
j= 0 


Bj{  v,tj) 


k 

=  ^2,Pj{t)P(Bk-j{v  +  t,  4))  =  o(A)  +  pk-i(t)(kA  +  o(A)) 
j= 0 

=  pk(t)(\  -  XA  -  o(A)),  k>  1; 
po(t  +  A)  =  po(t)(\  -  XA-  o(A)). 

Transforming  the  last  equation,  we  find  that 


Po(t  +  A)  -  po(t) 
A 


Xpo(t)+o(l). 
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Therefore  the  derivative  of  po  exists  and  is  given  by 

p'o(t)  =  -A.po(0- 

In  a  similar  way  we  establish  the  existence  of 

Pk(t)  =  Xpk-i(t)  -  kpk(t ),  k  >  1.  (2.4.1) 

Now  note  that  since  the  functions  /?^(0  are  continuous,  one  should  put  f>o(0)  =  1, 
pk  (0)  =  0  for  k  >  1 .  Hence 

po(t)  =  e~kt. 

Using  induction  and  substituting  into  (2.4.1)  the  function  pk-i(t)  =  K  {k-iy. — »  we 
establish  (it  is  convenient  to  make  the  substitution  pk  =  e~XtUk ,  which  turns  (2.4.1) 
into  u'k=  )  that 


Pkit)  = 


This  is  the  Poisson  distribution  with  parameter  Xt. 

To  understand  the  construction  of  the  probability  space  in  this  problem,  one 
should  consider  the  set  Q  of  all  non-decreasing  step-functions  x(t)  >  0,  t  >  0,  tak¬ 
ing  values  0,  1,  2, _ Any  such  function  can  play  the  role  of  an  elementary  out¬ 

come:  its  jump  points  indicate  the  collision  times,  the  value  x(t)  itself  will  be  the 
number  of  collisions  during  the  time  interval  (0,  t).  To  avoid  a  tedious  argument  re¬ 
lated  to  introducing  an  appropriate  a -algebra,  for  the  purposes  of  our  computations 
we  could  treat  the  probability  as  given  on  the  algebra  A  (see  Sect.  2.1)  generated 
by  the  sets  { x(t)  =k}  ,t>  0;k  =  0,l,...  (note  that  all  the  events  considered  in  this 
problem  are  just  of  such  form).  The  above  argument  shows  that  one  has  to  put 


P(x(v  +  t)  —  x(v)  =  k ) 


(See  also  the  treatment  of  Poisson  processes  in  Chap.  19.) 


□ 


By  these  examples  we  would  like  not  only  to  illustrate  the  application  of  the  total 
probability  formula,  but  also  to  show  that  the  construction  of  probability  spaces  in 
real  problems  is  not  always  a  simple  task. 

Of  course,  for  each  particular  problem,  such  constructions  are  by  no  means  nec¬ 
essary,  but  we  would  recommend  to  carry  them  out  until  one  acquires  sufficient 
experience. 

Assume  that  events  A  and  B i, . . . ,  Bn  satisfy  the  conditions  stated  at  the  begin¬ 
ning  of  this  section.  If  P(A)  >  0,  then  under  these  conditions  the  following  Bayes  ’ 
formula  holds  true: 


P(Bj\A) 


P(Bj)P(A\Bj) 

Yl=xP(Bk)P{A\Bky 
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This  formula  is  simply  an  alternative  way  of  writing  the  equality 


P(Bj\A) 


P  (BjA) 
P  (A) 


where  in  the  numerator  one  should  make  use  of  the  definition  of  conditional  prob¬ 
ability,  and  in  the  denominator,  the  total  probability  formula.  In  Bayes’  formula  we 
can  take  n  =  oo,  just  as  for  the  total  probability  formula. 


Example  2.4.2  An  item  is  manufactured  by  two  factories.  The  production  volume 
of  the  first  factory  is  k  times  the  production  of  the  second  one.  The  proportion  of 
defective  items  for  the  first  factory  is  P\,  and  for  the  second  one  P2.  Now  assume 
that  the  items  manufactured  by  the  factories  during  a  certain  time  interval  were 
mixed  up  and  then  sent  to  retailers.  What  is  the  probability  that  you  have  purchased 
an  item  produced  by  the  second  factory  given  the  item  proved  to  be  defective? 

Let  B\  be  the  event  that  the  item  you  have  got  came  from  the  first  factory,  and 
B2  from  the  second.  It  easy  to  see  that 


P(*i)  = 


1 

1+k' 


P  (Bi)  = 


k 

1+k' 


These  are  the  so-called  prior  probabilities  of  the  events  B\  and  B2.  Let  A  be  the 
event  that  the  purchased  item  is  defective.  We  are  given  conditional  probabilities 
P(A|Z?i)  =  P\  and  P(A\B2)  =  Pi-  Now,  using  Bayes’  formula,  we  can  answer  the 
posed  question: 


P(£2|A)  = 


1 

1  +k 


Pi  A 


kP2 

P\  +  kP2 


Similarly,  P(B\  | A)  =  □ 

The  probabilities  P(#i  |  A)  and  P(B2\  A)  are  sometimes  called  posterior  proba¬ 
bilities  of  the  events  B\  and  B2  respectively,  after  the  event  A  has  occurred. 

Example  2.4.3  A  student  is  suggested  to  solve  a  numerical  problem.  The  answer  to 
the  problem  is  known  to  be  one  of  the  numbers  1 , ...  ,k.  Solving  the  problem,  the 
student  can  either  find  the  correct  way  of  reasoning  or  err.  The  training  of  the  student 
is  such  that  he  finds  a  correct  way  of  solving  the  problem  with  probability  p.  In 
that  case  the  answer  he  finds  coincides  with  the  right  one.  With  the  complementary 
probability  1  —  p  the  student  makes  an  error.  In  that  case  we  will  assume  that  the 
student  can  give  as  an  answer  any  of  the  numbers  1 , ...  ,k  with  equal  probabilities 
1  Ik. 

We  know  that  the  student  gave  a  correct  answer.  What  is  the  probability  that  his 
solution  of  the  problem  was  correct? 

Let  B\  ( B2 )  be  the  event  that  the  student’s  solution  was  correct  (wrong). 
Then,  by  our  assumptions,  the  prior  probabilities  of  these  events  are  P (B\)  =  p , 
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P(Z?2)  =  1  —  p.  If  the  event  A  means  that  the  student  got  a  correct  answer,  then 

P(A|Z?i)  =  l,  P(A\B2)  =  l/k. 


By  Bayes’  formula  the  desired  posterior  probability  P(Bi  |  A)  is  equal  to 


P(Bi|A)  = 


P(Bi)P(A|fli) 

P(Bl)P(A|B1)+P(B2)P(A|B2> 


P 


1 


1  + 


l -p  ' 

kp 


Clearly,  P(5i  |  A)  >  P(5i)  =  p  and  P(2?i  |  A)  is  close  to  1  for  large  k. 


Chapter  3 

Random  Variables  and  Distribution  Functions 


Abstract  Section  3.1  introduces  the  formal  definitions  of  random  variable  and  its 
distribution,  illustrated  by  several  examples.  The  main  properties  of  distribution 
functions,  including  a  characterisation  theorem  for  them,  are  presented  in  Sect.  3.2. 
This  is  followed  by  listing  and  briefly  discussing  the  key  univariate  distributions. 
The  second  half  of  the  section  is  devoted  to  considering  the  three  types  of  distri¬ 
butions  on  the  real  line  and  the  distributions  of  functions  of  random  variables.  In 
Sect.  3.3  multivariate  random  variables  (random  vectors)  and  their  distributions  are 
introduced  and  discussed  in  detail,  including  the  two  key  special  cases:  the  multi¬ 
nomial  and  the  normal  (Gaussian)  distributions.  After  that,  the  concepts  of  indepen¬ 
dence  of  random  variables  and  that  of  classes  of  events  are  considered  in  Sect.  3.4, 
establishing  criteria  for  independence  of  random  variables  of  different  types.  The 
theorem  on  independence  of  sigma-algebras  generated  by  independent  algebras  of 
events  is  proved  with  the  help  of  the  probability  approximation  theorem.  Then  the 
relationships  between  the  introduced  notions  are  extensively  discussed.  In  Sect.  3.5, 
the  problem  of  existence  of  infinite  sequences  of  random  variables  is  solved  with 
the  help  of  Kolmogorov’s  theorem  on  families  of  consistent  distributions,  which  is 
proved  in  Appendix  2.  Section  3.6  is  devoted  to  discussing  the  concept  of  integral  in 
the  context  of  Probability  Theory  (a  formal  introduction  to  Integration  Theory  is  pre¬ 
sented  in  Appendix  3).  The  integrals  of  functions  of  random  vectors  are  discussed, 
including  the  derivation  of  the  convolution  formulae  for  sums  of  independent  ran¬ 
dom  variables. 


3.1  Definitions  and  Examples 


Let  (I ,  P)  be  an  arbitrary  probability  space. 


Definition  3.1.1  A  random  variable  §  is  a  measurable  function  §  =  §(&>)  mapping 
(&,$)  into  (R,  03),  where  R  is  the  set  of  real  numbers  and  03  is  the  a -algebra  of  all 
Borel  sets,  i.e.  a  function  for  which  the  inverse  image  =  {co  :  §(&>)  6  B }  of 

any  Borel  set  B  e  03  is  a  set  from  the  a -algebra 
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For  example,  when  tossing  a  coin  once,  Q  consists  of  two  points:  heads  and  tails. 
If  we  put  1  in  correspondence  to  heads  and  0  to  tails,  we  will  clearly  obtain  a  random 
variable. 

The  number  of  points  showed  up  on  a  die  will  also  be  a  random  variable. 

The  distance  between  the  origin  to  a  point  chosen  at  random  in  the  square  [0  < 
v<l,0<y<l]  will  also  be  a  random  variable,  since  the  set  {(v,  y)  :  x2  +  y2  <  t} 
is  measurable.  The  reader  might  have  already  noticed  that  in  these  examples  it  is 
very  difficult  to  come  up  with  a  non-measurable  function  of  co  which  would  be  re¬ 
lated  to  any  real  problem.  This  is  often  the  case,  but  not  always.  In  Chap.  18,  devoted 
to  random  processes,  we  will  be  interested  in  sets  which,  generally  speaking,  are  not 
events  and  which  require  special  modifications  to  be  regarded  as  events. 

As  we  have  already  mentioned  above,  it  follows  from  the  definition  of  a  random 
variable  that,  for  any  set  B  from  the  a -algebra  93  of  Borel  sets  on  the  real  line, 

Hence  one  can  define  a  probability  F%(B)  =  P(§  e  B)  on  the  measurable  space 
(R,  93)  which  generates  the  probability  space  (R,  93,  F$>- 

Definition  3.1.2  The  probability  F^(Z?)  is  called  the  distribution  of  the  random 
variable  § . 

Putting  B  =  (— oo,  x)  one  obtains  the  function 

(x)  =  F|  (— oo,  x)  =  P(£  <  x) 

defined  on  the  whole  real  line  which  is  called  the  distribution  function1  of  the  ran¬ 
dom  variable  § . 

We  will  see  below  that  the  distribution  function  of  a  random  variable  completely 
specifies  its  distribution  and  is  often  used  to  describe  the  latter. 

Where  it  leads  to  no  confusion,  we  will  write  just  F,  F(x)  instead  of  F^,  F^(x), 
respectively.  More  generally,  in  what  follows,  as  a  rule,  we  will  be  using  boldface 
letters  F,  G,  I,  O,  K,  II,  etc.  to  denote  distributions,  and  the  standard  font  letters  F, 
G,  /,  0, . . .  to  denote  the  respective  distribution  functions. 

Since  a  random  variable  §  is  a  mapping  of  Q  into  R,  one  has  P(|§|  <  oo)  =  1. 
Sometimes,  it  is  also  convenient  to  consider  along  with  such  random  variables  ran¬ 
dom  variables  which  can  assume  the  values  ±oo  (they  will  be  measurable  map¬ 
pings  of  Q  into  R  U  {±oo}).  If  P(|§  |  =  oo)  >  0,  we  will  call  such  random  variables 
§  (co)  improper.  Each  situation  where  such  random  variables  appear  will  be  explic¬ 
itly  noted. 

Example  3.1.1  Consider  the  Bernoulli  scheme  with  success  probability  p  and  sam¬ 
ple  size  k  (see  Sect.  3.3).  As  we  know,  the  set  of  elementary  outcomes  Q  in  this  case 


Tn  the  English  language  literature,  the  distribution  function  is  conventionally  defined  as  F%(x)  = 
P«  <  x).  The  only  difference  is  that,  with  the  latter  definition,  F  will  be  right-continuous,  cf. 
property  F3  below. 
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is  the  set  of  all  k -tuples  of  zeros  and  ones.  Take  the  a -algebra  #  to  be  the  system  of 
all  subsets  of  £2 .  Define  a  random  variable  on  £2  as  follows:  to  each  k-tuple  of  zeros 
and  ones  we  relate  the  number  of  ones  in  this  tuple. 

The  probability  of  r  successes  is,  as  we  already  know, 

/>(r,*)=f*V(l 

Therefore  the  distribution  function  F(x)  of  our  random  variable  will  be  defined 
as 

F (x)  =  P(r,k). 

r<x 

Here  the  summation  is  over  all  integers  r  which  are  less  than  x.  If  x  <  0  then 
F(x)  =  0,  and  if  x  >  k  then  F(x)  =  1. 


Example  3.1.2  Suppose  we  choose  a  point  at  random  from  the  segment  [a,  b\,  i.e. 
the  probability  that  the  chosen  point  is  in  a  subset  of  [a,  b ]  is  taken  to  be  proportional 
to  the  Lebesgue  measure  of  this  subset.  Here,  £2  is  the  segment  [a,  b],  the  a  -algebra 
#  is  the  class  of  Borel  subsets  of  [a,  b].  Define  a  random  variable  §  by 

£(&>)  =  co,  co  e  [a,  b\, 


i.e.  the  value  of  the  random  variable  is  equal  to  the  number  from  [a,  b]  we  have  cho¬ 
sen.  It  is  a  measurable  function.  If  x  <  a,  then  F(x)  =  P(§  <x)=0.  Let  x  e  (a,  b]. 
Then  {§  <  x}  means  that  the  point  is  in  the  interval  [a,  x).  The  probability  of  this 
event  is  proportional  to  the  length  of  the  interval,  hence 


F(x)  =  P(§<x)  = 


x  —  a 
b  —  a 


If  x  >  b,  then  clearly  F(x)  =  1.  Finally,  we  find  that 

10,  x  <  a, 

f Ef,  a  <x  <b,  (3.1.1) 

1,  x  >  b. 

This  distribution  function  defines  the  so-called  uniform  distribution  on  the  interval 
[a,  b]. 

If  is  the  Lebesgue  measure  on  (M,  03),  then,  as  we  will  see  in  the  next 
section,  it  is  not  hard  to  show  that  in  this  case  F|(Z?)  =  /jl{B  Fl  [a,  b])/(b  —  a). 


3.2  Properties  of  Distribution  Functions.  Examples 

3.2.1  The  Basic  Properties  of  Distribution  Functions 

Let  F(x)  be  the  distribution  function  of  a  random  variable  §.  Then  F(x)  has  the 
following  properties: 
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FI.  Monotonicity :  if  x i  <  X2,  then  F(x i)  <  F(X2). 
F2.  lim^^-oo  F(x)  =  0  and  lim^^oo  F(x)  =  1. 
F3.  Left- continuity:  limx^xo  F(x)  =  F(x o). 


Proof  Since  for  x\  <  V2  one  has  {§  <  vi}  c  {£  <  X2},  FI  immediately  follows  from 
property  3  of  probability  (see  Sect.  3.2.2). 

To  prove  F2,  consider  two  number  sequences  {xn}  and  {yn}  such  that  {xn}  is 
decreasing  and  xn  — >  —00,  while  {yn}  is  increasing  and  yn  ->  00.  Put  A„  =  {§  <  Jtn} 
and  =  {§  <  yn}.  Since  xn  tends  monotonically  to  —00,  the  sequence  of  sets  An 
decreases  monotonically  to  P|  An  =  0.  By  the  continuity  axiom  (see  Sect.  3.2.1), 
P(AW)  — >  0  as  n  ->  00  or,  which  is  the  same,  lim^oo  F(xn)  =  0.  This  and  the 
monotonicity  of  F(x)  imply  that 

lim  F(x)  ~  0. 

x— >  — 00 

Since  the  sequence  {yn}  tends  monotonically  to  00,  the  sequence  of  sets  Bn  in¬ 
creases  to  |J  Bn  =  L2,  and  hence  (see  property  9  in  Sect.  3.2.2)  P (Bn)  — >  1.  This 
implies,  as  above,  that 

lim  F(yn)  =  1,  lim  F(x)  =  1. 

ft— >00  x^oo 

Property  F3  is  proved  in  a  similar  way.  Let  {xn }  be  an  increasing  sequence  with 

Xn  t  *0. 

A  =  {£  <  vo},  =  {§  <  xn}. 

The  sequence  of  sets  An  also  increases,  and  (J  An  =  A.  Therefore,  P(AW)  — >  P(A). 
This  means  that 


lim  F(x)  =  F(xo).  □ 


It  is  not  hard  to  see  that  the  function  F  would  be  right-continuous  if  we  put 
F(x)=  ?($<x). 

With  our  definition,  the  function  F  is  generally  speaking  not  right-continuous, 
since  by  the  continuity  axiom 


F{x  +  0)  —  F(x)  =  lim  I  F  (  v  H —  )  —  F(x) 

n—^00  \  \  n 


lim  P  (  v  <  §  <  v 

ft — >00 


=  P(f  =*)■ 


§  6 


1 

v,  v  -| — 
n 


This  means  that  F(x)  is  continuous  if  and  only  if  P(§  =  x)  =  0  for  any  v.  Exam¬ 
ples  3.1.1  and  3.1.2  show  that  both  continuous  and  discontinuous  F(x)  are  quite 
common. 

From  the  above  relations  it  also  follows  that 


P(*  <  %  <  y)  =  F$([*,  y])  =  F(y  +  0)  -  F(x). 
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Theorem  3.2.1  If  a  function  F(x)  has  properties  FI,  F2  and  F3,  then  there  exist  a 
probability  space  (C2,  P)  and  a  random  variable  §  such  that  F^(x)  =  F(x). 

Proof  First  we  construct  a  probability  space  (£2,  P) .  Take  Q  to  be  the  real  line  R, 
#  the  tr-algebra  93  of  Borel  sets.  As  we  already  know  (see  Sect.  3.2.1),  to  construct 
a  probability  space  (R,  03,  P)  it  suffices  to  define  a  probability  on  the  algebra  A 
generated,  say,  by  the  semi-intervals  of  the  form  [•,  •)  (then  o  (A)  =  03).  An  arbitrary 
element  of  the  algebra  A  has  the  form  of  a  finite  union  of  disjoint  semi-intervals: 

n 

A  =  \J[ai'bi),  Oi  <  bj 

i  =  1 

(the  values  of  at  and  b[  can  be  infinite).  We  define 

n 

P(A)  =  YJ(F(bl)~  F(a,)). 
i  =  1 

It  is  absolutely  clear  that  axioms  PI  and  P2  are  satisfied  by  virtue  of  FI  and  F2.  It 
remains  to  verify  the  countable  additivity,  or  continuity,  of  P  on  the  algebra  A.  Let 
Bn  e  A ,  Bn+ 1  c  Bn,  H~i  Bn  =  B  e  A.  One  has  to  show  that  P  (Bn)  P  (B)  as 
n  — >  oo  or,  which  is  the  same,  that  P (BnB)  — >  0  ( BnB  e  A).  To  this  end,  it  suffices 
to  prove  that,  for  any  fixed  N ,  P (BnBC]\r)  — >  0,  where  Cjy  =  [— N,  N).  Indeed,  for 
any  given  e  >  0,  by  virtue  of  F2  we  can  choose  an  N  such  that  P(C #)  <  e.  Then 
P (Bn  B  Cn)  <  P(C n)  <  s  and 

limsupP(5w5)  <  limsupP(5w5C^)  +  s. 

n — >-  oo  n^o o 

Since  e  is  arbitrary,  the  convergence  P (BhBCn)  — >►  0  as  n  — >  oo  implies  the  re¬ 
quired  convergence  P (BnB)  ->  0.  It  follows  that  we  can  assume  that  the  sets  Bn  are 
bounded  ( Bn  C  [-N,  N )  for  some  N  <  oo).  Moreover,  we  can  assume  without  loss 
of  generality  that  B  is  the  empty  set. 

By  the  above  remarks,  Bn  admits  the  representation 

kn 

Bn  =  [J\a?,b?),  kn  <  oo, 
i= 1 

where  a™,  bl-  are  finite.  Further  note  that,  for  a  given  e  >  0  and  any  semi-interval 
l a,b ),  one  can  always  find  an  embedded  interval  [a,b  —  8),  8  >0,  such  that 
P ([a,  b  —  8))  >  P ([a,  b))  —  s.  This  follows  directly  from  property  F3:  F(b  —  8)  — > 
F{b)  as  8  |  0.  Hence,  for  a  given  e  >  0  and  set  Bn ,  there  exist  &?>o,i  =  i,...,k„, 
such  that 


U  [a»,b»-8?)cBn, 


P(Bn)  >  P(Bn)  -  s2~n . 
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Now  add  the  right  end  points  of  the  semi-intervals  to  the  set  Bn  and  consider  the 
closed  bounded  set 


*»  =  UK  *?-*?]■ 


i= 1 


Clearly, 


oo 


B„  C  Kn  C  Bn,  K  =  P|  Kn  =  0, 


/?  =  1 


P(Bn-  Kn)=¥(BnKn)<s2- 


-n 


It  follows  from  the  relation  K  =  0  that  Kn  =  0  for  all  sufficiently  large  w.  Indeed, 
all  the  sets  Kn  belong  to  the  closure  [C#]  =  [N,  — N ]  which  is  compact.  The  sets 
{An  =  [Cat]  —  Kn}^=l  form  an  open  covering  of  [C^],  since 

u  An = [cN]  (u  = [c^]  (n Kn) = [c^]- 

n  '  n  '  '  n  ' 

Thus,  by  the  Heine-Borel  lemma  there  exists  a finite  subcovering  {An}”°=  1 ,  no  <  oo, 
such  that  U;;ii  =  [C#]  or,  which  is  the  same,  O'nL]  =  0-  Therefore 


n  o 


n  o 


P(B„0)  =  P  Bn Q  p|  K«  =  P  Bno  U  K 


'n o  I  I  I  ^ n 

n  —  1 
n0 


no  I  l  J 

n  —  1 

no  \  no 


=  P  [J  BnoKn  )  <PI  \jBnKn\<YJs2-n  <e. 

Thus,  for  a  given  s  >  0  we  found  an  no  (depending  on  s)  such  that  P (Bno)  <  s. 
This  means  that  J*(Bn)  — >  0  as  n  — >  oo.  We  proved  that  axiom  P3  holds. 

So  we  have  constructed  a  probability  space.  It  remains  to  take  §  to  be  the  identity 
mapping  of  R  onto  itself.  Then 


F%(x)  =  P(§  <  x)  =  P(— oo,  x)  =  F(x) 


□ 


The  model  of  the  sample  probability  space  based  on  the  assertion  just  proved  is 
often  used  in  studies  of  distribution  functions. 


Definition  3.2.1  A  probability  space  (£2,  F)  is  called  a  sample  space  for  a  ran¬ 
dom  variable  §  ( co )  if  Q  is  a  subset  of  the  real  line  R  and  §  ( co )  =  co. 

The  probability  F  =  is  called,  in  accordance  with  Definition  3.1.1  from 
Sect.  3.1,  the  distribution  of  §.  We  will  write  this  as 

§^F.  (3.2.1) 

It  is  obvious  that  constructing  a  sample  probability  space  is  always  possible.  It 
suffices  to  put  Q  =  R,  $  =  33,  F (B)  =  P(§  e  B).  For  integer- valued  variables 
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§  the  space  {£2,3)  can  be  chosen  in  a  more  “economical”  way  by  taking  £2  = 

1,0,...}. 

Since  by  Theorem  3.2.1  the  distribution  function  F(x)  of  a  random  variable  § 
uniquely  specifies  the  distribution  F  of  this  random  variable,  along  with  (3.2.1)  we 
will  also  write  §  £=  F. 

Now  we  will  give  examples  of  some  of  the  most  common  distributions. 


3.2.2  The  Most  Common  Distributions 


1 .  The  degenerate  distribution  la .  The  distribution  \a  is  defined  by 

0  if  aeB, 

1  if  a  £  B. 

This  distribution  is  concentrated  at  the  point  a:  if  §  ^  \a,  then  P(§  =  a)  =  1.  The 
distribution  function  of  \a  has  the  form 


I  a(B)={ 


F(x)  = 


0  for  v  <  a, 
1  for  x  >  a. 


The  next  two  distributions  were  described  in  Examples  3.1.1  and  3.1.2  of 
Sect.  3.1. 

2.  The  binomial  distribution  B" .  By  the  definition,  §  €=  Bnp  (n  >  0  is  an  integer, 

p  e  (0,  1))  if  P(§  =  k)  =  (^) pk(  1  —  p)n~k ,  0  <k  <n.  The  distribution  will  be 
denoted  by  Bp . 

3.  The  uniform  distribution  If  §  ^  then 

/ i(B  fl  [a,  b]) 


P(§  €  B)  = 


ti([a,b]) 


where  p  is  the  Lebesgue  measure.  We  saw  that  this  distribution  has  distribution 
function  (3.1.1). 

The  next  distribution  plays  a  special  role  in  probability  theory,  and  we  will  en¬ 
counter  it  many  times. 

4.  The  normal  distribution  &a  a2  (the  normal  or  Gaussian  law).  We  will  write 
2  if 


P(£  eB)  =  <S>a  ct2  (B)  =  — \=  f  e-(«-“)2/(2<r2)  du  (3.2.2) 

0\J2ll  JB 

The  distribution  a2  depends  on  two  parameters:  a  and  o  >  0.  If  a  =  0,  o  =  1,  the 
normal  distribution  is  called  standard.  The  distribution  function  of  4>o,  l  is  equal  to 


<p{x)  =  <&0ji((-oo,x))  =  — L=  f  e  u2/2du. 

\2j X  J  —  OO 

The  distribution  function  of  <I>a  a2  is  obviously  equal  to  <P((x  —  a)/cr),  so  that  the 
parameters  a  and  o  have  the  meaning  of  the  “location”  and  “scale”  of  the  distribu¬ 
tion. 
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The  fact  that  formula  (3.2.2)  defines  a  distribution  follows  from  Theorem  3.2.1 
and  the  observation  that  the  function  <P(x)  (or  <P((x  —  a) /a))  satisfies  properties 
F1-F3,  since  &(— oo)  =  0,  d>(oo)  =  1,  and  <P(x)  is  continuous  and  monotone.  One 
could  also  directly  use  the  fact  that  the  integral  in  (3.2.2)  is  a  countably  additive  set 
function  (see  Sect.  3.6  and  Appendix  3). 

5.  The  exponential  distribution  Ta.  The  relation  §  €=  Ta  means  that  §  is  nonneg¬ 
ative  and 

P(§  G  B)  =  Ta(B)  =  a  I  e~au  du. 

J  BC\(  0,oo) 

The  distribution  function  of  f  (=  ra  clearly  has  the  form 


P(S  <  X)  = 


j  _ e~ax 

< 


for  v  >  0, 
for  v  <  0. 


The  exponential  distribution  is  a  special  case  of  the  gamma  distribution  Ta^,  to  be 
considered  in  more  detail  in  Sect.  7.7. 

6.  A  discrete  analogue  of  the  exponential  distribution  is  called  the  geometric 
distribution.  It  has  the  form 


P ($=k)  =  (l-p)pk,  pe( 0,1),  *  =  0,1.... 


7.  The  Cauchy  distribution  Ka  cr.  As  was  the  case  with  the  normal  distribution, 
this  distribution  depends  on  two  parameters  a  and  o  which  are  also  location  and 
scale  parameters.  If  §  €=  then 


P(§  €  B)  = 


=-!- 
xcr  JB  1 


du 


+  ((w  -  a) /a )2 


The  distribution  function  K(x)  of  Kq,i  is 


K(x)  = 


du 


The  distribution  function  of  Ka ?(T  is  equal  to  K((x  —  a)cr).  All  the  remarks  made 
for  the  normal  distribution  continue  to  hold  here. 


Example  3.2.1  Suppose  that  there  is  a  source  of  radiation  at  a  point  ( a ,  cr),  o  >  0, 
on  the  plane.  The  radiation  is  registered  by  a  detector  whose  position  coincides  with 
the  v-axis.  An  emitted  particle  moves  in  a  random  direction  distributed  uniformly 
over  the  circle.  In  other  words,  the  angle  77  between  this  direction  and  the  vector 
(0,  —1)  has  the  uniform  distribution  \3-n^  on  the  interval  [—71,71].  Observation 
results  are  the  coordinates  §1,  §2,  •  •  •  of  the  points  on  the  v-axis  where  the  particles 
interacted  with  the  detector.  What  is  the  distribution  of  the  random  variable  §  =  ? 

To  find  this  distribution,  consider  a  particle  emitted  at  the  point  (a,  a)  given 
that  the  particle  hit  the  detector  (i.e.  given  that  r\  e  [—tx/2,  tt/2]).  It  is  clear  that 
the  conditional  distribution  of  r)  given  the  last  event  (of  which  the  probability  is 
P(?7  G  [— 7t/2,  7t/2])  =  1/2)  coincides  with  U_^/2,7r/2-  Since  (§  —  a)/cr  =  tan  77, 
one  obtains  that 
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P(§  <  x)  =  P(a  +  (7  tan  7)  <  x) 

(rj  1  x  —  a\  1  1  x  —  a 

—  <  —  arctan -  I  =  — | —  arctan - 

it  Tt  a  )  2  it  a 


Recalling  that  (arctan u)'  =  1/(1  +m2),  we  have 


arctan  v  = 


-  f 

du 

_  r 

Jo 

1  +  u2 

2  —  00 

1 

r  {x—a)/o 

du 

7t  J 

-00 

1  +  u2 

du  Tt 

+  u2  2  ’ 
'  x  —  a 

—  K 


a 


Thus  the  coordinates  of  the  traces  on  the  v-axis  of  the  particles  emitted  from  the 
point  (a,  cr)  have  the  Cauchy  distribution 

8.  The  Poisson  distribution  11^.  We  will  write  §  ^  11^  if  §  assumes  nonnegative 
integer  values  with  probabilities 

ym 

P(§  =  m)  =  — e~}\  X  >  0,  m  =  0,  1,2,... 
ml 

The  distribution  function,  as  in  Example  3.1.1,  has  the  form  of  a  sum: 


F(x)  = 


^2m<x 

0 


1 


m 


mi 


for  v  >  0, 
for  v  <  0. 


3.2.3  The  Three  Distribution  Types 

All  the  distributions  considered  in  the  above  examples  can  be  divided  into  two  types. 

I.  Discrete  Distributions 

Definition  3.2.2  The  distribution  of  a  random  variable  §  is  called  discrete  if  §  can 
assume  only  finitely  or  countably  many  values  x\,  X2, . . .  so  that 

Pk  =  P($  =Xk)  >0,  ^pk=  1 . 


A  discrete  distribution  {pk}  can  obviously  always  be  defined  on  a  discrete  prob¬ 
ability  space.  It  is  often  convenient  to  characterise  such  a  distribution  by  a  table: 


Values 

x\ 

X2 

X3  •  •  • 

Probabilities 

Pi 

P2 

P3 

The  distributions  Ia,  B",  11^,  and  the  geometric  distribution  are  discrete.  The 
derivative  of  the  distribution  function  of  such  a  distribution  is  equal  to  zero  every¬ 
where  except  at  the  points  x\,  X2, . . .  where  F(x)  is  discontinuous,  the  jumps  being 

F(xk  +0)  -  F(xk)  =  Pk< 

An  important  class  of  discrete  distributions  is  formed  by  lattice  distributions. 
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Definition  3.2.3  We  say  that  random  variable  §  has  a  lattice  distribution  with  span 
h  if  there  exist  a  and  h  such  that 

oo 

Y  P(£  =a  +  kh)  =  l.  (3.2.3) 

k  =  —  OG 

If  h  is  the  greatest  number  satisfying  (3.2.3)  and  the  number  a  lies  in  the  interval 
[0,  h)  then  these  numbers  are  called  the  span  and  the  shift ,  respectively,  of  the  lattice. 

If  a  =  0  and  h  =  1  then  the  distribution  is  called  arithmetic.  The  same  terms  will 
be  used  for  random  variables. 

Obviously  the  greatest  common  divisor  (g.c.d.)  of  all  possible  values  of  an  arith¬ 
metic  random  variable  equals  1 . 

II.  Absolutely  Continuous  Distributions 

Definition  3.2.4  The  distribution  F  of  a  random  variable  §  is  said  to  be  absolutely 
continuous 2  if,  for  any  Borel  set  B , 

F(S)=P(?gS)=  /  f(x)dx,  (3.2.4) 

Jb 

where  fix)  >  0,  JY  f(x)dx  =  1. 

The  function  f(x)  in  (3.2.4)  is  called  the  density  of  the  distribution. 

It  is  not  hard  to  derive  from  the  proof  of  Theorem  3.2. 1  (to  be  more  precise,  from 
the  theorem  on  uniqueness  of  the  extension  of  a  measure)  that  the  above  definition 
of  absolute  continuity  is  equivalent  to  the  representation 

F^(x)  =  f  f{u)du 
2—00 

for  all  v  £  R.  Distribution  functions  with  this  property  are  also  called  absolutely 
continuous. 


2The  definition  refers  to  absolute  continuity  with  respect  to  the  Lebesgue  measure.  Given  a  measure 
\i  on  (R,  03)  (see  Appendix  3),  a  distribution  F  is  called  absolutely  continuous  with  respect  to  /x 
if,  for  any  B  E  03,  one  has 


F (B)=  f  f(x)u(dx). 

Jb 

In  this  sense  discrete  distributions  are  also  absolutely  continuous,  but  with  respect  to  the  count¬ 
ing  measure  m.  Indeed,  if  one  puts  f(xk )  =  Pk,  m(B)  =  {the  number  of  points  from  the  set 
(xi,X2, . . .)  which  are  in  B},  then 

F (B)  pk=^2  f(xk )  =  f  f  (x)m(dx) 

XkcB  XfccB  B 


(see  Appendix  3). 
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Fig.  3.1  The  plot  shows  the 
result  of  the  first  three  steps  in 
the  construction  of  the  Cantor 
function 


The  function  f(x)  is  determined  by  the  above  equalities  up  to  its  values  on  a  set 
of  Lebesgue  measure  0.  For  this  function,  the  relation  f(x)  =  dFJxx)  holds  almost 
everywhere  (with  respect  to  the  Lebesgue  measure). 

The  distributions  U a,b>  a2’  Ka,o-  and  Ta  are  absolutely  continuous.  The  den¬ 
sity  of  the  normal  distribution  with  parameters  a  and  a  is  equal  to 

d>aa2(x)  =  ^e-^2^2\ 

\j2no 

From  their  definitions,  one  could  easily  derive  the  densities  of  the  distributions  U fl>&, 
Ka?a  and  Ta  as  well.  The  density  of  Ka?cr  has  a  shape  resembling  that  of  the  normal 
density,  but  with  ‘‘thicker  tails”  (it  vanishes  more  slowly  as  |jc|  — >  oo). 

We  will  say  that  a  distribution  F  has  an  atom  at  point  x\  if  F({vi})  >  0.  We  saw 
that  any  discrete  distribution  consists  of  atoms  but,  for  an  absolutely  continuous 
distribution,  the  probability  of  hitting  a  set  of  zero  Lebesgue  measure  is  zero.  It 
turns  out  that  there  exists  yet  a  third  class  of  distributions  which  is  characterised 
by  the  negation  of  both  mentioned  properties  of  discrete  and  absolutely  continuous 
distributions. 

III.  Singular  Distributions 

Definition  3.2.5  A  distribution  F  is  said  to  be  singular  (with  respect  to  Lebesgue 
measure)  if  it  has  no  atoms  and  is  concentrated  on  a  set  of  zero  Lebesgue  measure. 

Because  a  singular  distribution  has  no  atoms,  its  distribution  function  is  continu¬ 
ous.  An  example  of  such  a  distribution  function  is  given  by  the  famous  Cantor  func¬ 
tion  of  which  the  whole  variation  is  concentrated  on  the  interval  [0,1]:  F(x)  =  0 
for  v  <  0,  F(x)  =  1  for  x  >  1.  It  can  be  constructed  as  follows  (the  construction 
process  is  shown  in  Fig.  3.1). 


3The  assertion  about  the  “almost  everywhere”  uniqueness  of  the  function  /  follows  from  the 
Radon-Nikodym  theorem  (see  Appendix  3). 
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Divide  the  segment  [0,  1]  into  three  equal  parts  [0,  1/3],  [1/3,  2/3],  and  [2/3,  1]. 
On  the  inner  segment  put  F(x)  =  1/2.  The  remaining  two  segments  are  again  di¬ 
vided  into  three  equal  parts  each,  and  on  the  inner  parts  one  sets  F(x)  to  be  1  /4  and 
3/4,  respectively.  Each  of  the  remaining  segments  is  divided  in  turn  into  three  parts, 
and  F(x)  is  defined  on  the  inner  parts  as  the  arithmetic  mean  of  the  two  already 
defined  neighbouring  values  of  F(x),  and  so  on.  At  the  points  which  do  not  belong 
to  such  inner  segments  F(x)  is  defined  by  continuity.  It  is  not  hard  to  see  that  the 
total  length  of  such  “inner”  segments  on  which  F(x)  is  constant  is  equal  to 


1  2  4 

3  +  9  +  27 


1  1 

3  1-2/3 


so  that  the  function  F(x)  grows  on  a  set  of  measure  zero  but  has  no  jumps. 

From  the  construction  of  the  Cantor  distribution  we  see  that  dF(x)/dx  =  0  al¬ 
most  everywhere. 

It  turns  out  that  these  three  types  of  distribution  exhaust  all  possibilities. 

More  precisely,  there  is  a  theorem  belonging  to  Lebesgue  stating  that  any  distri¬ 
bution  function  F(x)  can  be  represented  in  a  unique  way  as  a  sum  of  three  compo¬ 
nents:  discrete,  absolutely  continuous,  and  singular.  Hence  an  arbitrary  distribution 
function  cannot  have  more  than  a  countable  number  of  jumps  (which  can  also  be 
observed  directly:  we  will  count  all  the  jumps  if  we  first  enumerate  all  the  jumps 
which  are  greater  than  1  /2,  then  the  jumps  greater  than  1  /3,  then  greater  than  1  /4, 
etc.).  This  means,  in  particular,  that  F(x)  is  everywhere  continuous  except  perhaps 
at  a  countable  or  finite  set  of  points. 

In  conclusion  of  this  section  we  will  list  several  properties  of  distribution  func¬ 
tions  and  densities  that  arise  when  forming  new  random  variables. 


3.2.4  Distributions  of  Functions  of  Random  Variables 

For  a  given  function  g(x),  to  find  the  distribution  of  g(§)  we  have  to  impose  some 
measurability  requirements  on  the  function.  The  function  g(v)  is  called  Borel  if  the 
inverse  image 

g~l(B )  =  [x  :g(x)  e  B } 

of  any  Borel  set  B  is  again  a  Borel  set.  For  such  a  function  g  the  distribution  function 
of  the  random  variable  rj  =  g(§)  equals 

Fgi^(x)  =  P(g(£)  <x)=P(£  eg-1  (-oo,*)). 

If  g(v)  is  continuous  and  strictly  increasing  on  an  interval  ( a,b )  then,  on  the 
interval  (g(a),  g(b)),  the  inverse  function  y  =  is  defined  as  the  solution  to 


4See  Sect.  3.5  in  Appendix  3. 
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the  equation  g(y)  =  x.5  Since  g  is  a  monotone  mapping  we  have 

{#(£)  <x}  =  {£  <g(_1)(x)}  for x  e  (g(a),g(b)). 

Thus  we  get  the  following  representation  for  Fg(%)  in  terms  of  Ff.  for  x  e 
(g(a),  g(b)), 

Fgm(x)  =P($  <g-\x))  =  Fi:(g-\x)).  (3.2.5) 

Putting  g  =  we  obtain,  in  particular,  that  if  F^  is  continuous  and  strictly  increas¬ 
ing  on  (< a ,  b)  and  F(a)  =  0,  F(b)  =  1  (—a  and  b  may  be  oo)  then 

F^(g(~X)  {x))  =  x 

for  x  e  [0,  1]  and  therefore  the  random  variable  rj  =  F%(i=)  is  uniformly  distributed 
over  [0,  1]. 

Definition  3.2.6  The  quantile  transform  F^~l\f)  of  an  arbitrary  distribution  F 
with  the  distribution  function  F(x)  is  the  “generalised”  inverse  of  the  function  F 

F^~l\y)  :=  sup{v  :  F(x)  <  y }  for  y  e  (0, 1]; 

F^_1^(0)  :=  inf {jv  :  F(x)  >  0}. 

In  mathematical  statistics,  the  number  /^“^(y)  is  called  the  quantile  of  order  y 
of  the  distribution  F.  The  function  F^~l)  has  a  discontinuity  of  size  b  —  a  at  a  point 
y  if  0 a ,  b)  is  the  interval  on  which  F  is  constant  and  such  that  F(x)  =  y  e  [0,  1). 

Roughly  speaking,  the  plot  of  the  function  F^~1^  can  be  obtained  from  that  of  the 
function  F(x)  on  the  (x,  y)  plane  in  the  following  way:  rotate  the  (x,  y)  plane  in 
the  counter  clockwise  direction  by  90°,  so  that  the  v-axis  becomes  the  ordinate  axis, 
but  the  y-axis  becomes  the  abscissa  axis  directed  to  the  left.  To  switch  to  normal 
coordinates,  we  have  to  reverse  the  direction  of  the  new  v-axis. 

Further,  if  v  is  a  point  of  continuity  and  a  point  of  growth  of  the  function  F  (i.e., 
F(x)  is  a  point  of  continuity  of  then  is  the  unique  solution  of  the 

equation  F(x)  =  y  and  the  equality  F(F(~[fy))  =  y  holds. 

In  some  cases  the  following  statement  proves  to  be  useful. 

Theorem  3.2.2  Let  q  ^  Uo,i  •  Then,  for  any  distribution  F, 

/(-1)07)e  F. 

Proof  If  F(x)  >  v  then  F(~l\y)  =  sup{n  :  F(v )  <  y]  <  x,  and  vice  versa:  if 
F{x)  <  y  then  F ^  ^(y)  >  x  (recall  that  F(x)  is  left-continuous).  Therefore  the 
following  inclusions  are  valid  for  the  sets  in  the  (x,  y)  plane: 

{y  <  F(x)}  c  <x}  C  [y  <  F(x)}. 


5 For  an  arbitrary  non-decreasing  function  g,  the  inverse  function  ^(x)  is  defined  by  the  equa¬ 
tion 

g(-1)(y)  :=  inf {-V  :g(x)  >  y}  =  supjx  :g(x)  <  y}. 
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Substituting  rj  Uo,i  in  place  of  y  in  these  relations  yields  that,  for  any  x,  such 
inclusions  hold  for  the  respective  events,  and  hence 

P (^F^~l\rj)  <  x)  =  P(?7  <  F(x))  =  F(x). 

The  theorem  is  proved.  □ 


Thus  we  have  obtained  an  important  method  for  constructing  random  variables 
with  prescribed  distributions  from  uniformly  distributed  random  variables.  For  in¬ 
stance,  if  rj  €=  Uo,i  then  §  =  —  (l/o?)ln  rj  ^  Ta. 

In  another  special  case,  when  g(x)  =  a  +  bx,  b  >  0,  from  (3.2.5)  we  get  Fg(g)  = 
F^  ((jc  —  a) /b).  We  have  already  used  this  relation  to  some  extent  when  considering 
the  distributions  a2  and  . 

If  a  function  g  is  strictly  increasing  and  differentiable  (the  inverse  function 
is  defined  in  this  case),  and  §  has  a  density  fix),  then  there  exists  a  density  for  g(§) 
which  is  equal  to 

fgm(y)  =  /(s(-1)00)(s(-1)0’))'  =  /(*)  <F, 

where  x  =  g(_1)(y),  y  =  g(x).  A  similar  argument  for  decreasing  g  leads  to  the 
general  formula 


fg&(y)  =  fix) 


dx 

dy 


For  g(x)  =  a  +  bx,  b  0,  one  obtains 


fa+b^iy)  = 


y-q\ 

b  y 


3.3  Multivariate  Random  Variables 

Let  §i,§2 be  random  variables  given  on  a  common  probability  space 
(£?,#,  P).  To  each  co,  these  random  variables  put  into  correspondence  an  n- 
dimensional  vector  §(&>)  =  §2(ct>), . . . ,  ^n(co)). 

Definition  3.3.1  A  mapping  Q  — >  W1  given  by  random  variables  §i,  §2>  •  •  •  >  £«  is 
called  a  random  vector  or  multivariate  random  variable. 

Such  a  mapping  Q  — >  W1  is  a  measurable  mapping  of  the  space  {Q,  F)  into  the 
space  ( W1 ,  Q3W),  where  93w  is  the  a -algebra  of  Borel  sets  in  W1.  Therefore,  for  Borel 
sets  B ,  the  function  P ^(B)  =  P(§  e  B)  is  defined. 

Definition  3.3.2  The  function  (B)  is  called  the  distribution  of  the  vector  §. 

The  function 


F^n(xi,...,xn)=F(^i  <x\ <vw) 
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is  called  the  distribution  function  of  the  random  vector  (§1, . . . ,  §w)  or  joint  distri¬ 
bution  function  of  the  random  variables  £1 ,  •  •  ■ ,  • 

The  following  properties  of  the  distribution  functions  of  random  vectors,  analo¬ 
gous  to  properties  F1-F3  in  Sect.  3.2,  hold  true. 

FF1.  Monotonicity :  “Multiple”  differences  of  the  values  of  the  function  F^...^, 
which  correspond  to  probabilities  of  hitting  arbitrary  “open  at  the  right”  paral¬ 
lelepipeds,  are  nonnegative.  For  instance,  in  the  two-dimensional  case  this  means 
that,  for  any  x\  <  X2,  y i  <  y2  (the  points  (jq,  y i)  and  (jq,  y2)  being  the  “extreme” 
vertices  of  the  parallelepiped), 

F^2{xi,y2)  -  Fth.hix 2,yi)  -  {F^2{x\,yi)  -  F^^2{x\,yi))  >0. 

This  double  difference  is  nothing  else  but  the  probability  of  hitting  the  “semi-open” 
parallelepiped  [x\,  xf)  x  [yi,  yf)  by 

In  other  words,  the  differences 


Fsubit,  yi)  ~  F$uh(t,  yi)  for  y\  <  yi 

must  be  monotone  in  t.  (For  this  to  hold,  the  monotonicity  of  the  function 
(t,  y i)  is  not  sufficient.) 

FF2.  The  second  property  can  be  called  consistency. 

lim  F^ | . .  (v  i , . . . ,  Xjq )  F{: ^ . .  ,^n_  i  l  ?  •  •  •  •>  — l )  ? 

JCfi  ^  oo 

lim  F^.,.^(x  i,...,xn)=0. 

Xn  — >  —  OO 

FF3.  Left-continuity. 

lim  ...|B (xi,...,x'n)  =  F^ (xi,...,xn). 

4  too 

That  the  limits  in  properties  FF2  and  FF3  are  taken  in  the  last  variable  is  inessential, 
for  one  can  always  renumber  the  components  of  the  vectors. 

One  can  prove  these  properties  in  the  same  way  as  in  the  one-dimensional  case. 
As  above,  any  function  F(x  1, . . . ,  xn)  possessing  this  collection  of  properties  will 
be  the  distribution  function  of  a  ( multivariate )  random  variable. 

As  in  the  one-dimensional  case,  when  considering  random  vectors  §  = 
(§1  ,•••,§«),  we  can  make  use  of  the  simplest  sample  model  of  the  probability  space 
(£2,  F,  P).  Namely,  let  Q  coincide  with  W1  and  F  =  23”  be  the  cr -algebra  of  Borel 
sets.  We  will  complete  the  construction  of  the  required  probability  space  if  we  put 
F (B)  =  F|(Z?)  =  P(§  e  B)  for  any  B  e  23 A  It  remains  to  define  the  random  vari¬ 
able  as  the  value  of  the  elementary  event  itself,  i.e.  to  put  §  ( co )  =  co ,  where  co  is  a 
point  in  Rn. 

It  is  not  hard  to  see  that  the  distribution  function  F^...^  uniquely  determines  the 
distribution  F^(F).  Indeed,  F^...^  defines  a  probability  on  the  <r-algebra  A  gener¬ 
ated  by  rectangles  {at  <Xi  <  bp  i  =  1, . . . ,  n}.  For  example,  in  the  two-dimensional 


case 
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P(«i  <  % l  <  b\ ,  a2  <  &  <  b2) 

=  P(£i  <  *1,02  <  ?2  <  i>2)  -P(£i  <  01,02  <  ?2  <  bi) 

=  Rii,?2(^i>^2)  -  02)]  -  [^,^2(01,  z?2)  -  ^,12(01, 02)]. 

But  23"  =  cr(yi),  and  it  remains  to  make  use  of  the  measure  extension  theorem  (see 
Sect.  3.2.1). 

Thus  from  a  distribution  function  =  F  one  can  always  construct  a  sample 

probability  space  (W\  23",  F^)  and  a  random  variable  §(&>)  =  00  on  it  so  that  the 
latter  will  have  the  prescribed  distribution  . 

As  in  the  one-dimensional  case,  we  say  that  the  distribution  of  a  random  vector 
is  discrete  if  the  random  vector  assumes  at  most  a  countable  set  of  values. 

The  distribution  of  a  random  vector  will  be  absolutely  continuous  if,  for  any 
Borel  set  B  cMn, 

F^B)  =  P^eB)  =  [  f(x)dx , 

JB 

where  clearly  f(x)  >0  and  fQf(x)dx  =  1. 

This  definition  can  be  replaced  with  an  equivalent  one  requiring  that 

/x\  nxn 

I  f  (t\ , . . . ,  tf)  dt\  dtn .  (3.3.1) 

-00  J —00 

Indeed,  if  (3.3.1)  holds,  we  define  a  countably  additive  set  function 

Q (B)=  f  f(x)dx 

JB 

(see  properties  of  integrals  in  Appendix  3),  which  will  coincide  on  rectangles 
with  F^.  Consequently,  F^(B)  =  Q(B). 

The  function  f(x)  is  called  the  density  of  the  distribution  of  §  or  density  of  the 
joint  distribution  of  §1, . . . ,  §w.  The  equality 

dn 

—  7  C^l  ’  •  •  •  »  xn)  —  f  (,%  1  ?  •  •  •  5  xn) 

dx\  •  •  •  0Xn 

holds  for  this  function  almost  everywhere. 

If  a  random  vector  §  has  density  f(x\, . . . ,  xn),  then  clearly  any  “subvector” 
(ijki  •  •  •  £&n),  ki  <  n,  also  has  a  density  equal  (let  for  the  sake  of  simplicity  ki  =  /, 
i  =  1, . . . ,  s)  to 

f(x  i,...,xs)  =  J  f(xi,...,xn)dxs+i---dxn. 

Let  continuously  differentiable  functions  yi  =  gi(x  ,xn)  be  given  in  a  region 
A  C  R" .  Suppose  they  are  univalently  resolvable  for  x\ , . . . ,  xn :  there  exist  functions 

Xi  =  gj~l\yi, . . . ,  yn),  and  the  Jacobian  J  =  |9xz- /3yz- 1  ^  0  in  A.  Denote  by  B  the 
image  of  A  in  the  range  of  (yi , . . . ,  yn).  Suppose  further  that  a  random  vector  §  = 
(§1  ,...,§„)  has  a  density  f^(x).  Then  ip  =  giif  1 , . . . ,  £n)  will  be  random  variables 
with  a  joint  density  which,  at  a  point  (yi , . . . ,  yn)  £  B,  is  equal  to 


fn(yu---,yn)  =  f^(xi,...,xn)\J\; 


(3.3.2) 
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moreover 


P(£  e  A)  = 


=  /  /<?(*  1, 

Ja 

L 


xn)dx i  •  •  -  dxn  = 


=  / 

Jb 


=  /  /nCyi>  •  •  • .  yn)dyi  ■  ■  -dyn  =  P(rf  e  B). 


...,xn)\J\dyi---dyn 

(3.3.3) 


This  is  clearly  an  extension  to  the  multi-dimensional  case  of  the  property  of  densities 
discussed  at  the  end  of  Sect.  3.2.  Formula  (3.3.3)  for  integrals  is  well-known  in 
calculus  as  the  change  of  variables  formula  and  could  serve  as  a  proof  of  (3.3.2). 

The  distribution  of  a  random  vector  §  is  called  singular  if  the  distribution  has 
no  atoms  (F^  ({*})  =  0  for  any  x  e  M")  and  is  concentrated  on  a  set  of  zero  Lebesgue 
measure. 

Consider  the  following  two  important  examples  of  multivariate  distributions  (we 
continue  the  list  of  the  most  common  distribution  from  Sect.  3.2). 

9.  The  multinomial  distribution  B" .  We  use  here  the  same  symbol  B”  as  we  used 
for  the  binomial  distribution.  The  only  difference  is  that  now  by  p  we  understand  a 
vector  p  =  (p i, . . . ,  pr ),  pj  >  0,  Y^j= t  Pj  —  1>  which  could  be  interpreted  as  the 
collection  of  probabilities  of  disjoint  events  Aj ,  (J  Ay  =  f2.  For  an  integer- valued 
random  vector  v  =  (v\ , . . . ,  vr),  we  will  write  v  5  if  for  k  =  (ki , . . . ,  kr),  kj  >  0, 
J^-=1  kj  ~  n  one  has 


P(V  =  k)=  -  ,”!  P*1  •  •  •  Pr'  ■  (3-3-4) 

k\\---kr\ 

On  the  right-hand  side  we  have  a  term  from  the  expansion  of  the  polynomial  (p\  + 

- b  Pr^n  into  powers  of  p\, . . . ,  pr.  This  explains  the  name  of  the  distribution.  If 

p  is  a  number,  then  evidently  B”  =  B7(^  \_py  so  that  the  binomial  distribution  is  a 
multinomial  distribution  with  r  =  2. 

The  numbers  vj  could  be  interpreted  as  the  frequencies  of  the  occurrence  of 
events  Aj  in  n  independent  trials,  the  probability  of  occurrence  of  Aj  in  a  trial 
being  pj.  Indeed,  the  probability  of  any  fixed  sequence  of  outcomes  containing 

k\, ...  ,kr  outcomes  A\ , . . . ,  Ar,  respectively,  is  equal  to  p\l  •  •  •  p\r ,  and  the  number 
of  different  sequences  of  this  kind  is  equal  to  n\/k\ !  •  •  •  kr\  (of  n\  permutations  we 
leave  only  those  which  differ  by  more  than  merely  permutations  of  elements  inside 
the  groups  of  k\ , . . . ,  kr  elements).  The  result  will  be  the  probability  (3.3.4). 


Example  3.3.1  The  simplest  model  of  a  chess  tournament  with  two  players  could 
be  as  follows.  In  each  game,  independently  of  the  outcomes  of  the  past  games,  the 
1st  player  wins  with  probability  p,  loses  with  probability  q ,  and  makes  a  draw  with 
probability  l  —  p  —  q.  In  that  case  the  probability  that,  in  n  games,  the  1st  player 
wins  i  and  loses  j  games  (/  +  j  <n),  is 


p(n\ ij)  = 


p-q) 


n-i-j 


Suppose  that  the  tournament  goes  on  until  one  of  the  players  wins  N  games  (and 
thereby  wins  the  tournament).  If  we  denote  by  q  the  duration  of  the  tournament  (the 
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number  of  games  played  before  its  end)  then 


N- 1  N- 1 

P  (r]=n)  =  E  p{n  —  1;  N  —  1,  /)/?  +  pin  —  1;  /,  A  —  l)g. 

i =0  i =0 


10.  The  multivariate  normal  (or  Gaussian )  distribution  0^2.  Let  a  =  (aq, 

. . . ,  ar)  be  a  vector  and  cr2  =  || cr/y  ||,  i,  j  =  1, . . . ,  r,  a  symmetric  positive  definite 
matrix,  and  A  =  ||a^  ||  the  matrix  inverse  to  o2  =  A-1.  We  will  say  that  a  vector 
§  =  (§i , . . . ,  §r)  has  the  normal  distribution :  §  4>a  a2,  if  it  has  the  density 


Va.aliV 


VW  I 
o^exp 


Here  T  denotes  transposition: 


T 

a)Aix  —  a) 


xAxt  =  ajjXjXj 


It  is  not  hard  to  verify  that 


/ 


cpa  a2  (v)  dx i  •  •  •  dxr  =  1 


(see  also  Sect.  7.6). 


3.4  Independence  of  Random  Variables  and  Classes  of  Events 
3.4.1  Independence  of  Random  Vectors 

Definition  3.4.1  Random  variables  , . . . ,  are  said  to  be  independent  if 

P(§1  G  B\ , . . . ,  i;n  G  Bn)  =  P(§i  G  fit)  •  •  •  P(§w  £  Bn)  (3.4.1) 

for  any  Borel  sets  B\ , . . . ,  Bn  on  the  real  line. 

One  can  introduce  the  notion  of  a  sequence  of  independent  random  variables.  The 
random  variables  from  the  sequence  {^n}^Li  given  on  a  probability  space  (£?,  P), 

are  independent  if  (3.4.1)  holds  for  any  integer  n  so  that  the  independence  of  a 
sequence  of  random  variables  reduces  to  that  of  any  finite  collection  of  random 
variable  from  this  sequence.  As  we  will  see  below,  for  a  sequence  of  independent 
random  variables,  any  two  events  related  to  disjoint  groups  of  random  variables 
from  the  sequence  are  independent. 

Another  possible  definition  of  independence  of  random  variables  follows  from 
the  assertion  below. 

Theorem  3.4.1  Random  variables  , . . . ,  are  independent  if  and  only  if 

(xi,...,xn)  =  (xi)  •  •  •  F^n  ( xn ). 

The  proof  of  the  theorem  is  given  in  the  third  part  of  the  present  section. 

An  important  criterion  of  independence  in  the  case  when  the  distribution  of  §  = 
(§1  ,•••,§«)  is  absolutely  continuous  is  given  in  the  following  theorem. 
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Theorem  3.4.2  Let  random  variables  §i, . . . ,  have  densities  f\(x),  ... ,  /w(v), 
respectively.  Then  for  the  independence  of  %\, ...  it  is  necessary  and  sufficient 
that  the  vector  §  =  (§i , . . . ,  §w)  has  a  density  f(x\, . . .  ,xn)  which  is  equal  to 

fix  i,  =  /i(*i)  •••//! (x*). 

Thus,  if  it  turns  out  that  the  density  of  §  equals  the  product  of  densities  of  %j ,  that 
will  mean  that  the  random  variables  are  independent. 

We  leave  it  to  the  reader  to  verify,  using  this  theorem,  that  the  components  of  a 
normal  vector  (§i  are  independent  if  and  only  if  atj  =  0,  crZJ-  =  0  for  i  j . 

Proof  of  Theorem  3.4.2  If  the  distribution  function  of  the  random  variable  &  is  given 
by 

/Xi 

fi(ti)dti 

-00 

and  &  are  independent,  then  the  joint  distribution  function  will  be  defined  by  the 
formula 


fn(tn)dt, 


Ft-l.-Sn  (xi,...,Xn)  =  (xi)  •  •  •  F^n  (xn) 

/Xl  px 

-oo  J — c 

/%  1  r%n 

•  /  Mh)  ■  ■  ■  fn{tn)  dt\ 

-oo  J —oo 


-dt 


n 


/x\  px„ 

•  •  /  /l(^l)  •  •  •  fn(tn)  dt\  •  •  •  dtn, 

-oo  J  —oo 


Conversely,  assuming  that 

1  . .  (X  1  5  •  ’  •  5  ) 

we  come  to  the  equality 

F$i (*1 , . . . ,  x„)  =  (vi)  •  •  •  F$n  (xn), 
The  theorem  is  proved. 


□ 


Now  consider  the  discrete  case.  Assume  for  the  sake  of  simplicity  that  the  com¬ 
ponents  of  §  may  assume  only  integral  values.  Then  for  the  independence  of  i-j  it  is 
necessary  and  sufficient  that,  for  all  k\ , . . . ,  kn , 

P(£i  =h,  =fc*)  =  P(£i  =  fci)  ■■■?(£*  =  fc*). 

Verifying  this  assertion  causes  no  difficulties,  and  we  leave  it  to  the  reader. 

The  notion  of  independence  is  very  important  for  Probability  Theory  and  will  be 
used  throughout  the  entire  book.  Assume  that  we  are  formalising  a  practical  problem 
(constructing  an  appropriate  probability  model  in  which  various  random  variables 
are  to  be  present).  How  can  one  find  out  whether  the  random  variables  (or  events) 
to  appear  in  the  model  are  independent?  In  such  situations  it  is  a  justified  rule  to 
consider  events  and  random  variables  with  no  causal  connection  as  independent. 
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The  detection  of  “probabilistic”  independence  in  a  mathematical  model  of  a 
random  phenomenon  is  often  connected  with  a  deep  understanding  of  its  physical 
essence. 

Consider  some  simple  examples.  For  instance,  it  is  known  that  the  probability 
of  a  new-born  child  to  be  a  boy  (event  A)  has  a  rather  stable  value  P(A)  =  22/43. 
If  B  denotes  the  condition  that  the  child  is  born  on  the  day  of  the  conjunction  of 
Jupiter  and  Mars,  then,  under  the  assumption  that  the  position  of  the  planets  does  not 
determine  individual  fates  of  humans,  the  conditional  probability  P(A|Z?)  will  have 
the  same  value:  P(A\B)  =  22/43.  That  is,  the  actual  counting  of  the  frequency  of 
births  of  boys  under  these  specific  astrological  conditions  would  give  just  the  value 
22/43.  Although  such  a  counting  might  never  have  been  carried  out  at  a  sufficiently 
large  scale,  we  have  no  grounds  to  doubt  its  results. 

Nevertheless,  one  should  not  treat  the  connection  between  “mathematical”  and 
causal  independence  as  an  absolute  one.  For  instance,  by  Newton’s  law  of  gravita¬ 
tion  the  flight  of  a  missile  undoubtedly  influences  the  simultaneous  flight  of  another 
missile.  But  it  is  evident  that  in  practice  one  can  ignore  this  influence.  This  example 
also  shows  that  independence  of  events  and  variables  in  the  concrete  and  relative 
meaning  of  this  term  does  not  contradict  the  principle  of  the  universal  interdepen¬ 
dence  of  all  events. 

It  is  also  interesting  to  note  that  the  formal  definition  of  independence  of  events  or 
random  variables  is  much  wider  than  the  notion  of  real  independence  in  the  sense  of 
affiliation  to  causally  unrelated  phenomena.  This  follows  from  the  fact  that  “math¬ 
ematical”  independence  can  take  place  in  such  cases  when  one  has  no  reason  for 
assuming  no  causal  relation.  We  illustrate  this  statement  by  the  following  example. 
Let  rj  be  a  random  variable  uniformly  distributed  over  [0,  1].  Then  in  the  expansion 
of  r\  into  a  binary  fraction 


§1  ,  §2  ,  §3  , 

rj  — - - - h 

1  2  4  8 


the  random  variables  will  be  independent  (see  Example  1 1.3.1),  although  they  all 
have  a  related  origin. 

One  can  see  that  this  circumstance  only  enlarges  the  area  of  applicability  of  all 
the  assertions  we  obtain  below  under  the  formal  condition  of  independence. 

The  notion  of  independence  of  random  variables  is  closely  connected  with  that 
of  independence  of  a -algebras. 


3.4.2  Independence  of  Classes  of  Events 

Let  P)  be  a  probability  space  and  A\  and  A 2  classes  of  events  from  the  a- 

algebra 


6For  a  more  detailed  discussion  of  connections  between  causal  and  probabilistic  independence,  see 
[24],  from  where  we  borrowed  the  above  examples. 
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Definition  3.4.2  The  classes  of  events  A\  and  A 2  are  said  to  be  independent  if,  for 
any  events  A\  and  A 2  such  that  A\  e  A\  and  A2  £  A2,  one  has 

P(A1A2)=P(A1)P(A2). 


The  following  definition  introduces  the  notion  of  independence  of  a  sequence  of 
classes  of  events. 

Definition  3.4.3  Classes  of  events  {An}^=l  are  independent  if,  for  any  collection  of 
integers  n  1, . . . ,  n^ , 


for  any  An.  e  Anj . 

For  instance,  in  a  sequence  of  independent  trials,  the  sub-<r -algebras  of  events 
related  to  different  trials  will  be  independent.  The  independence  of  a  sequence  of 
algebras  of  events  also  reduces  to  the  independence  of  any  finite  collection  of  alge¬ 
bras  from  the  sequence.  It  is  clear  that  subalgebras  of  events  of  independent  algebras 
are  also  independent. 

Theorem  3.4.3  a -algebras  21 1  and  212  generated ,  respectively ,  by  independent  al¬ 
gebras  of  events  A\  and  A2  are  independent. 

Before  proving  this  assertion  we  will  obtain  an  approximation  theorem  which 
will  be  useful  for  the  sequel.  By  virtue  of  the  theorem,  any  event  A  from  the  o- 
algebra  21  generated  by  an  algebra  A  can,  in  a  sense,  be  approximated  by  events 
from  A.  To  be  more  precise,  we  introduce  the  “distance”  between  events  defined  by 

d(A,  B)  =  P (AB  U  AB)  =  P (AB)  +  P(A£)  =  P (A  -  B)  +  P(£  -  A). 

This  distance  possesses  the  following  properties: 

d(A ,  B)  =  d(A,B ), 
d(A,C)<d(A,B)+d(B,C ), 
d(AB,CD)<d(A,C)+d(B,D),  (  '  '  } 

P(A)  -P(B)|  <  d(A,  B). 

The  first  relation  is  obvious.  The  triangle  inequality  follows  from  the  fact  that 

d(A,  C)  =  P(AC)  +  P(AC)  =  P (ACB)  +  P (A~CB)  +  P (ACB)  +  P (ACB) 

<  P (CB)  +  P (AB)  +  P (AB)  +  P (CB)  =  d(A ,  B)  +  d(B,  C). 

The  third  relation  in  (3.4.2)  can  be  obtained  in  a  similar  way  by  enlarging  events 
under  the  probability  sign.  Finally,  the  last  inequality  in  (3.4.2)  is  a  consequence  of 
the  relations 


P(A)  =  P(Afl)  +  P (AB)  =  P (B)  -  P (BA)  +  P(Afl). 
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Theorem  3.4.4  (The  approximation  theorem)  Let  (12,  #,  P)  be  a  probability  space 
and  21  the  a -algebra  generated  by  an  algebra  A  of  events  from  Then,  for  any 
A  G  21,  there  exists  a  sequence  An  e  A  such  that 

lim  d(A,  An)  =  0.  (3.4.3) 

n — >oo 

By  the  last  inequality  from  (3.4.2),  the  assertion  of  the  theorem  means  that 
P(A)  =  lim^oo  P(A„)  and  that  each  event  A  e  21  can  be  represented,  up  to  a  set  of 
zero  probability,  as  a  limit  of  a  sequence  of  events  from  the  generating  algebra  A 
(see  also  Appendix  1). 

Proof  We  will  call  an  event  AgJ  approximable  if  there  exists  a  sequence  An  e  A 
possessing  property  (3.4.3),  i.e.  d(An,  A)  — >  0. 

Since  d(A,  A)  =  0,  the  class  of  approximable  events  21*  contains  A.  Therefore 
to  prove  the  theorem  it  suffices  to  verify  that  21*  is  a  o -algebra. 

The  fact  that  21*  is  an  algebra  is  obvious,  for  the  relations  A  e  21*  and 
Be  21*  imply  that  A,  A  U  B,  A  n  B  e  21.  (For  instance,  if  d(A,  An)  — >  0  and 
d(B,Bn)  — >  0,  then  by  the  third  inequality  in  (3.4.2)  one  has  d(AB ,  AnBn)  < 
d(A ,  An)  +  d(B,  Bn)  — >  0,  so  that  AB  e  21*.) 

Now  let  C  =  n^=i  Ck  where  Ck  e  21*.  Since  21*  is  an  algebra,  we  have  Dn  = 
ULi  Ck  £  21*;  moreover, 

</(Ai,  C)  =  P(C  -  Dn)  =  P(C)  -  P(D„)  ->  0. 

Therefore  one  can  choose  An  e  A  so  that  d(Dn ,  A„)  <  l//i,  and  consequently  by 
virtue  of  (3.4.2)  we  have 

<i(C,  Aw)  <  d(C,  Dn)  +  d(Dn,  An)  —>  0. 

Thus  C  e  21*  and  hence  21*  forms  a  a -algebra.  The  theorem  is  proved.  □ 

Proof  of  Theorem  3.4.3  is  now  easy.  If  A\  e  21 1  and  A2  e  2I2,  then  by  Theorem  3.4.4 
there  exist  sequences  A\n  eA\  and  A2^  e  A2  such  that  d(A[ ,  A^)  — >►  0  as  n  — >  00, 
i  =  1,2.  Putting  B  =  A1A2  and  =  AiwA2^,  we  obtain  that 

d(B,  Bn )  <  J(Ai ,  Ai^)  +  d(A2,  A 2n)  0 

as  n  — >  00  and 

P(Ai A2)  =  lim  P(fin)  =  lim  P(Ai^)P(A2n)  =  P(Ai)P(A2).  □ 

n — >00  n^oo 


3.4.3  Relations  Between  the  Introduced  Notions 

We  will  need  one  more  definition.  Let  §  be  a  random  variable  (or  vector)  given  on  a 
probability  space  (Q ,  P). 


7 


The  theorem  is  also  a  direct  consequence  of  the  lemma  from  Appendix  1 . 
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Definition  3.4.4  The  class  3f  of  events  from  3  of  the  form  A  =  %~l(B)  = 
{co  :  %(co)  e  B},  where  B  are  Borel  sets,  is  called  the  a -algebra  generated  by  the 
random  variable  § . 


It  is  evident  that  3f  is  a  a -algebra  since  to  each  operation  on  sets  A  there  corre¬ 
sponds  the  same  operation  on  the  sets  B  =  §(A)  forming  a  o  -algebra. 

The  a -algebra  3^  generated  by  the  random  variable  §  will  also  be  denoted  by 

*($). 

Consider,  for  instance,  a  probability  space  (12,  03,  P),  where  Q  =  R  is  the  real 
line  and  03  is  the  a -algebra  of  Borel  sets.  If 


£=£(*>)  = 


co  <  0, 
co  >  0, 


then  3|  clearly  consists  of  four  sets:  R,  0,  {&>  <  0}  and  {co  >  0}.  Such  a  random 
variable  §  cannot  distinguish  “finer”  sets  from  03.  On  the  other  hand,  it  is  obvious 
that  §  will  be  measurable  ({§  e  B]  e  03 1)  with  respect  to  any  other  “richer”  sub-a- 
algebra  03  \ ,  such  that  a  (§)  c  03 1  C  03. 

If  §  =  §  (co)  =  [co}  is  the  integral  part  of  co,  then  3^  will  be  the  a -algebra  of  sets 
composed  of  the  events  {k  <  co  <  k  +  l],  k  =  1,0, 1,... 

Finally,  if  i-(co)  =  cp(co)  where  cp  is  continuous  and  monotone,  cp(oo)  =  oo  and 
cp( — oo )  —  — oo,  then  3^  coincides  with  the  o -algebra  of  Borel  sets  03. 


Lemma  3.4.1  Let  §  and  rj  be  two  random  variables  given  on  (£2,  3,  P),  the  variable 
§  being  measurable  with  respect  to  cr(rj).  Then  §  and  ij  are  functionally  related ,  i.e. 
there  exists  a  Borel  function  g  such  that  §  =  g(rj). 


Proof  By  assumption, 


1 

*  +  i\] 

1  K 

1  <N 

_ l 

2n  ) 

e  a (jj). 


Denote  by  Bj =  {hi00)  :  00  c  the  images  of  the  sets  A&>w  on  the  line  R  under 
the  mapping  r) (co)  and  put  gn(x)  =  k/2n  for  v  e  5&jW.  Then  gnih)  =  [2ns]/2n  and 
because  A&>w  c  a(? 7),  Bk,n  c  03  and  gn  is  a  Borel  function.  Since  gn(x)  t  for  anY 
the  limit  lim^oo gn(x)  =  g(x)  exists  and  is  also  a  Borel  function.  It  remains  to 
observe  that  e  =  lim^oo  gn(h)  =  gih)  by  the  very  construction.  □ 


Now  we  formulate  an  evident  proposition  relating  independence  of  random  vari¬ 
ables  and  a -algebras. 

Random  variables  are  independent  if  and  only  if  the  a  -algebras 

<t(§  1), . . . ,  cr(^n)  are  independent. 

This  is  a  direct  consequence  of  the  definitions  of  independence  of  random  vari¬ 
ables  and  a -algebras. 

Now  we  can  prove  Theorem  3.4.1.  First  note  that  finite  unions  of  semi-intervals 
[• ,  •)  (perhaps  with  infinite  end  points)  form  a  a -algebra  generating  the  Borel  o -alge¬ 
bra  on  the  line:  33  =  cr (A). 
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Proof  of  Theorem  3.4.1  Since  in  one  direction  the  assertion  of  the  theorem  is  ob¬ 
vious,  it  suffices  to  verify  that  the  equality  F(x  \, ...  ,xn)  =  (xi)  •  •  •  F%n  (xn)  for 
the  joint  distribution  function  implies  the  independence  of  <r(§i cr(§w).  Put  for 
simplicity  n  =  2  and  denote  by  A  and  A  the  semi-intervals  [x\,  *2)  and  [yi,  yf), 
respectively.  The  following  equalities  hold: 


P(£i  e  A,%2  e  A)  =  P(£i  e  [xi,x2),$2  e  [yi,y2)) 

=  F(X 2,  y2)F(xi,y2)  -  F(x2,  yi)  +  F(x\,  yi) 
=  (F$ i(x2)  -  F^(x\))(F^2(y2)  -  F|2(yi)) 

=  P{§i  €  A}P{^2  e  A}. 


Consequently,  if  A[,  i  =  1 , ...  ,n,  and  Aj,  j  =  1 , . . . ,  m,  are  two  systems  of 
disjoint  semi-intervals,  then 


n 


PI  e 


i=l 


£P(ti  e  A,,be  Aj) 
hi 


^(htA.^eA,) 

hi 


(3.4.4) 


But  the  class  of  events  {co  :  §  ( co )  gA}  =  §_1  (A),  where  A  e  A,  forms,  along  with  A, 
an  algebra  (we  will  denote  it  by  a(§)),  and  one  has  cr(a(§))  =  <r(§).  In  (3.4.4) 
we  proved  that  Q'(^i)  and  a (§2)  are  independent.  Therefore  by  Theorem  3.4.3  the 
o -algebras  a{f  1)  =  and  o (§2)  =  cr (a 1))  are  also  independent.  The  the¬ 
orem  is  proved.  □ 


It  is  convenient  to  state  the  following  fact  as  a  theorem. 

Theorem  3.4.5  Let  cp\  and  <f>2  be  Bo  re  l  functions  and  §1  and  §2  be  independent 
random  variables.  Then  rj\  =  and  r) 2  =  cp2 (§2)  are  also  independent  random 

variables. 


Proof  We  have  to  verify  that,  for  any  Borel  sets  B\  and  B2 , 

P(^i(£i)  e  B\,(p2^2)  ^  B2)  =  P(^i(£i)  £  #l)P(^2(§2)  £  B2).  (3.4.5) 

But  the  sets  {x  :  cpt  (x)  e  Bi)  =  <p~l(Bi)  =  B*,  i  =  1,2,  are  again  Borel  sets.  There¬ 
fore 


{&> :  (piifi )  G  Bi }  —  {&> :  &  E  B* } , 


and  the  required  multiplicativity  of  probability  (3.4.5)  follows  from  the  indepen¬ 
dence  of  §1 .  The  theorem  is  proved.  □ 
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Let  {§ n  };c^1  be  a  sequence  of  independent  random  variables.  Consider  the  random 
variables  §&,  %k+\ , . . . ,  where  k  <  m  <  oo.  Denote  by  cr (§&,... ,  £w)  (for  m  =  o o 
we  will  write  cr(^,  •  •  •))  the  a -algebra  generated  by  the  events 

where  A(  G  cr(^). 

Definition  3.4.5  The  <j -algebra  <j(^,  . . . ,  §w)  is  said  to  be  generated  by  the  random 
variables  §&,-••,  §w 

In  the  sequel  we  will  need  the  following  proposition. 

Theorem  3.4.6  For  any  k  >  1,  a -algebra  o{£n+if)  fs  independent  of 

&  (£l  j  •  •  •  5  )  • 

Proof  To  prove  the  assertion,  we  make  use  of  Theorem  3.4.3.  To  this  end  we  have 
to  verify  that  the  algebra  A  generated  by  sets  of  the  form  B  =  H/Li  where 
Ai  G  o  ) ,  is  independent  of  o  (§„+&).  Let  A  g  cr  (§„+&)>  then  it  follows  from  the 
independence  of  the  cr -algebras  cr(§i),  cr(§2),  •  •  • ,  cr(§n),  vifn+k)  that 

P (AB)  =  P(A)P(AO  •  •  •  P(An)  =  P(A)  •  P (B). 

In  a  similar  way  we  verify  that 

p([Ja,a 

\i=\ 

(one  just  has  to  represent  U/=i  M  as  a  union  of  disjoint  events  from  A).  Thus  the 
algebra  A  is  independent  of  cf{fn+jf).  Hence  cr(£i, . . . ,  §w)  and  cr  (§„+&)  are  inde¬ 
pendent.  The  theorem  is  proved.  □ 

It  is  not  hard  to  see  that  similar  conclusions  can  be  made  about  vector-valued 
random  variables  ,  §2,  ■  ■  ■  defining  their  independence  using  the  relation 

P(§1  C  B\,  ...  ,i;n  G  Bn )  =  Y\  P  (fj  C  Bj ) , 

where  Bj  are  Borel  sets  in  spaces  of  respective  dimensions. 

In  conclusion  of  this  section  note  that  one  can  always  construct  a  probability 
space  (£2,  P)  ((M7\  P^))  on  which  independent  random  variables  §i, . . . ,  t=n 

with  prescribed  distribution  functions  F £ .  are  given  whenever  these  distributions 
F^j  are  known.  This  follows  immediately  from  Sect.  3.3,  since  in  our  case  the  joint 
distribution  function  F^  (x\ , . . . ,  xn)  of  the  vector  §  =  (§i  is  uniquely  deter¬ 

mined  by  the  distribution  functions  Ft.  (v)  of  the  variables  : 

n 

F^(xi,...,xn)  =  Y\  (xj). 

l 


P(A) 
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3.5  *  On  Infinite  Sequences  of  Random  Variables 

We  have  already  mentioned  infinite  sequences  of  random  variables.  Such  sequences 
will  repeatedly  be  objects  of  our  studies  below.  However,  there  arises  the  question 
of  whether  one  can  define  an  infinite  sequence  on  a  probability  space  in  such  a  way 
that  its  components  possess  certain  prescribed  properties  (for  instance,  that  they  will 
be  independent  and  identically  distributed). 

As  we  saw,  one  can  always  define  a  finite  sequence  of  independent  random  vari¬ 
ables  by  choosing  for  the  “compound”  random  variable  (§i,  ...,§„)  the  sample 
space  Ri  x  M2  x  •  •  •  x  =  Wl  and  a -algebra  53 1  x  03 1  x  •  •  •  x  03n  =  03n  gener¬ 
ated  by  sets  of  the  form  B\  x  B2  x  •  •  •  x  Bn  C  W1,  Bi  being  Borel  sets.  It  suffices 
to  define  probability  on  the  algebra  of  these  sets.  In  the  infinite-dimensional  case, 
however,  the  situation  is  more  complicated.  Theorem  3.2.1  and  its  extensions  to  the 
multivariate  case  are  insufficient  here.  One  should  define  probability  on  an  algebra 
of  events  from  M°°  =  YlkLi  ^k  so  that  its  closure  under  countably  many  operations 
U  and  H  form  the  a -algebra  33°°  generated  by  the  products  P|  Bjk,  Bjk  e  53  jk . 

Let  A  be  a  subset  of  integers.  Denote  by  =  n&eV  ^k  the  direct  product  of 
the  spaces  R&  over  ^gA,537V  =  ]_[^eV^^-  We  saY  that  distributions  P#'  and 
on  (Rn> ' ,  *Bn')  and  {RN" ,  *BN''}9  respectively,  are  consistent  if  the  measures  induced 
by  P Nt  and  P#"  on  the  intersection  n  (here  N  =  N'  Ll  N ")  coincide 

with  each  other.  The  measures  on  RN  are  said  to  be  the  projections  of  and  Pa^", 
respectively,  on  RN .  An  answer  to  the  above  question  about  the  existence  of  an 
infinite  sequence  of  random  variables  is  given  by  the  following  theorem  (the  proof 
of  which  is  given  in  Appendix  2). 

Theorem  3.5.1  (Kolmogorov)  Specifying  a  family  of  consistent  distributions  P^ 
on  finite-dimensional  spaces  rn  defines  a  unique  probability  measure  Poo  on 
(IR00,  53°°)  such  that  each  probability  P^  is  the  projection  of  Poo  onto 

It  follows  from  this  theorem,  in  particular,  that  one  can  always  define  on  an  appro¬ 
priate  space  an  infinite  sequence  of  arbitrary  independent  random  variables.  Indeed, 
direct  products  of  measures  given  on  Mi ,  M2,  •  •  •  for  different  products  RN  and  RN 
are  always  consistent. 


3.6  Integrals 

3.6.1  Integral  with  Respect  to  Measure 

As  we  have  already  noted,  defining  a  probability  space  includes  specifying  a  finite 
countably  additive  measure.  This  enables  one  to  consider  integrals  with  respect  to 
the  measure, 

j  s(£(a>))P(<*a>) 


(3.6.1) 
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over  the  set  £2  for  a  Borel  function  g  and  any  random  variable  §  on  (Q ,  P)  (recall 
that  g(x)  is  said  to  be  Borel  if,  for  any  t,  {x  :  g(x)  <  t]  is  a  Borel  set  on  the  real 
line). 

The  definition,  construction  and  basic  properties  of  the  integral  with  respect  to  a 
measure  are  assumed  to  be  familiar  to  the  reader.  If  the  reader  feels  his  or  her  back¬ 
ground  is  insufficient  in  this  aspect,  we  recommend  Appendix  3  which  contains  all 
the  necessary  information.  However,  the  reader  could  skip  this  material  if  he/she  is 
willing  to  restrict  him/herself  to  considering  only  discrete  or  absolutely  continuous 
distributions  for  which  integrals  with  respect  to  a  measure  become  sums  or  conven¬ 
tional  Riemann  integrals.  It  would  also  be  useful  for  the  sequel  to  know  the  Stieltjes 
integral;  see  the  comments  in  the  next  subsection. 

We  already  know  that  a  random  variable  §( co )  induces  a  measure  on  the  real 
line  which  is  specified  by  the  equality 

Ff  ([*,;y))  =  P(*  <£  <  y)  =  F^y)  -  F^{x). 

Using  this  measure,  one  can  write  the  integral  (3.6.1)  as 


j  g($(a>))V(da>)  =  j  g(x)F^dx). 


This  is  just  the  result  of  the  substitution  x  =  §( co ).  It  can  be  proved  simply  by 
writing  down  the  definitions  of  both  integrals.  The  integral  on  the  right  hand  side 
is  called  the  Lebesgue-Stieltjes  integral  of  the  function  g(x)  with  respect  to  the 
measure  and  can  also  be  written  as 


/ 


g(x)dF$(x). 


(3.6.2) 


3.6.2  The  Stieltjes  Integral 


The  integral  (3.6.2)  is  often  just  called  the  Stieltjes  integral,  or  the  Riemann-Stieltjes 
integral  which  is  defined  in  a  somewhat  different  way  and  for  a  narrower  class  of 
functions. 

If  g(x)  is  a  continuous  function,  then  the  Lebesgue-Stieltjes  integral  coincides 
with  the  Riemann-Stieltjes  integral  which  is  equal  by  definition  to 

r  N 

I  g(x)dF(x)  =  lim  lim  -  F  (**)]>  (3.6.3) 

J  b—>oo  N^o of — ^ 

•—  oo 


a- 


k= 0 


where  the  limit  on  the  right-hand  side  does  not  depend  on  the  choice  of  parti¬ 
tions  xo,  x\ , . . . ,  xn  of  the  semi-intervals  [a,  b)  and  points  Xk  6  =  [x^,  Xk+ 1). 

Partitions  xo,  x\, . . . ,  xn  are  different  for  different  N’s  and  have  the  property  that 
max/^x^+i  —  Xk)  — >  0  as  N  — >  oo. 

Indeed,  as  we  know  (see  Appendix  3),  the  Lebesgue-Stieltjes  integral  is 

[  g(x)dF(x)=  lim  lim  f  g^(x)¥^(dx),  (3.6.4) 

J  b  > oo  iV^oo  L 


a- 


-oo 
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where  gw  is  any  sequence  of  simple  functions  (assuming  finitely  many  values)  con¬ 
verging  monotonically  to  g(v).  We  see  from  these  definitions  that  it  suffices  to  show 
that  the  integrals  with  finite  integration  limits  coincide.  Since  the  Lebesgue- 

Stieltjes  integral  /  gdF  of  a  continuous  function  g  always  exists,  we  could  obtain 
its  value  by  taking  the  sequence  gjy  to  be  any  of  the  two  sequences  of  simple  func¬ 
tions  g x  and  g^  which  are  constant  on  the  semi-intervals  Ak  and  equal  on  them  to 

g*N(xk)  =  sup  g(x)  and  g*^(xk)  =  inf  g(x), 

xeAk  xeAk 


respectively.  Both  sequences  in  (3.6.4)  constructed  from  g ^  and  g will  clearly 
converge  monotonically  from  different  sides  to  the  same  limit  equal  to  the 
Lebesgue-Stieltjes  integral 


But  for  any  xk  E  Ak,  one  has 


f 


g(x)dF(x). 


g*N(Xk)<g(xk)<g*N(xk), 


and  therefore  the  integral  sum  in  (3.6.3)  will  be  between  the  bounds 


£#*  dF(x)  < 


N 

J2g(Xk)[F(xk+l)  -  F(xk)]< 
k= o 


g*NdF(x). 


These  inequalities  prove  the  required  assertion  about  the  coincidence  of  the  inte¬ 
grals. 

It  is  not  hard  to  verify  that  (3.6.3)  and  (3.6.4)  will  also  coincide  when  F(x)  is 
continuous  and  g(x)  is  a  function  of  bounded  variation.  In  that  case, 

f  g(x)dF(x)  =  g(x)F(x)\ba-  f  F(x)  dg(x). 

J  a  J  a 

Making  use  of  this  fact,  we  can  extend  the  definition  of  the  Riemann-Stieltjes  in¬ 
tegral  to  the  case  when  g(v)  is  a  function  of  bounded  variation  and  F(x)  is  an 
arbitrary  distribution  function.  Indeed,  let  F(x)  =  Fc(x)  +  Fci(x)  be  a  representa¬ 
tion  of  F(x)  as  a  sum  of  its  continuous  and  discrete  components,  and  yi,  y2, . . .  be 
the  jump  points  of  Fj(v): 


Pk  =  Fd(yk  +  0)  -  Fd(yk )  >  0. 

Then  one  has  to  put  by  definition 

j  g(x)dF(x)  =  y2pkg(yk)  +  J  g(x)dFc(x), 

where  the  Riemann-Stieltjes  integral  f  gdFc(x)  can  be  understood,  as  we  have 
already  noted,  in  the  sense  of  definition  (3.6.3). 

We  will  say,  as  is  generally  accepted,  that  f  gdF  exists  if  the  integral  f  \g\dF 
is  finite.  It  is  easy  to  see  from  the  definition  of  the  Stieltjes  integral  that,  for  step 
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functions  F(x)  (the  distribution  is  discrete),  the  integral  becomes  the  sum 


J  g(x)dF{x)  =  'Y^g(xk)(F{xk 


+  0)  -  F{xhj)  =  g(*k) P(S  =  xk), 

k 


where  x\,  X2,  • . .  are  jump  points  of  F(x).  If 

fx 

F(x)=  I  p(x)dx 

J  — oo 

is  absolutely  continuous  and  p(x)  and  g(x)  are  Riemann  integrable,  then  the  Stielt- 
jes  integral 

J g(x)dF(x)  =  J  g(x)p(x)dx 

becomes  a  conventional  Riemann  integral. 

We  again  note  that  for  a  reader  who  is  not  familiar  with  Stieltjes  integral  tech¬ 
niques  and  integration  with  respect  to  measures,  it  is  possible  to  continue  reading 
the  book  keeping  in  mind  only  the  last  two  interpretations  of  the  integral.  This  would 
be  quite  sufficient  for  an  understanding  of  the  exposition.  Moreover,  most  of  the 
distributions  which  are  important  from  the  practical  point  of  view  are  just  of  one  of 
these  types:  either  discrete  or  absolutely  continuous. 

We  recall  some  other  properties  of  the  Stieltjes  integral  (following  immediately 
from  definitions  (3.6.4)  or  (3.6.3)  and  (3.6.5)): 


f 


dF  =  F(b)  —  F(a); 


/ 


/ 


f 

J  a 


L 


gdF=  g  dF  +  g  dF  if  g  or  F  is  continuous  at  the  point  c; 


-f 


(g\+  gl)dF  =  gidF+  gidF\ 


/ 


fcsiF  =  cjgiF  for  c  =  const; 


f 


gdF  =  gF\ba-  j  Fdg 


f 

J  a 


if  g  is  a  function  of  bounded  variation. 


3.6.3  Integrals  of  Multivariate  Random  Variables. 

The  Distribution  of  the  Sum  of  Independent 
Random  Variables 

Integrals  with  respect  to  measure  (3.6.1)  make  sense  for  multivariate  variables 
£(<w)  =  (§i  (&>),...,  ^n(co))  as  well  (one  cannot  say  the  same  about  Riemann- 
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Stieltjes  integrals  (3.6.3)).  We  mean  here  the  integral 

I  g(%i(co),  ...,%n(a>))P(da>),  (3.6.5) 

Jn 

where  g  is  a  measurable  function  mapping  W1  into  R,  so  that  g(^i(co), . . . ,  ^n(co)) 
is  a  measurable  mapping  of  Q  into  R. 

If  (W1,  53'\  F|)  is  a  sample  probability  space  for  §,  then  the  integral  (3.6.5)  can 
be  written  as 

/  g(x)¥^(dx),  x  =  (x\, . . . ,  xn)  g  R”. 

JR" 

Now  turn  to  the  case  when  the  components  , . . . ,  of  the  vector  §  are  independent 
and  assume  first  that  n  =  2.  For  sets 

B  =  B\  x  B2  =  {(xi,x2)  :  xi  €  B\,  x2  g  B2)  c  M2, 

where  B\  and  ,62  are  measurable  subsets  of  R,  one  has  the  equality 

P(§  eB)=  P($i  €  Bi,  €  fi2)  =  P(§i  €  #i)Pfe  g  fi2).  (3.6.6) 


In  that  case  one  says  that  the  measure  F|1?|2(dxi, dx2)  =  P(£i  g  dx\,%2  G  Jx2) 
on  M2,  corresponding  to  (§1,  §2),  is  a  direct  product  of  the  measures 

F|i  (dx\)  =  P(§i  G  dx\)  and  ¥^2{dx2)  =  P(§2  G  dx2). 

As  we  already  know,  equality  (3.6.6)  uniquely  specifies  a  measure  on  (M2,  *B2) 
from  the  given  distributions  of  and  §2  on  (R,  53).  It  turns  out  that  the  integral 

J  g(xi,X2W$ih(dxi,dx2)  (3.6.7) 

with  respect  to  the  measure  F^  can  be  expressed  in  terms  of  integrals  with  respect 
to  the  measures  F^  and  F^2 .  Namely,  Fubini’s  theorem  holds  true  (for  the  proof  see 
Appendix  3  or  property  5A  in  Sect.  4.8). 


Theorem  3.6.1  (Theorem  on  iterated  integration)  For  a  Bore l  function  g(x,y )  >  0 
and  independent  §1  and  §2, 


J  g(xi,X2W$lh(dxi,dX2) 


g(xi,x2)F^2(dx2) 


F|,  (dx  1). 


(3.6.8) 


If  g(x,y)  can  assume  values  of  different  signs ,  then  the  existence  of  the  integral 
on  the  left-hand  side  of  (3.6.8)  is  required  for  the  equality  (3.6.8).  The  order  of 
integration  on  the  right-hand  side  of  (3.6.8)  may  be  changed. 


It  is  shown  in  Appendix  3  that  the  measurability  of  g(x,  y)  implies  that  of  the 
integrands  on  the  right-hand  side  of  (3.6.8). 

Corollary  3.6.1  Let  g(xi ,  x2)  =  gi  (x\)g2(x2).  Then ,  if  at  least  one  of  the  following 
three  conditions  is  met : 


(1)  gl  >  0 ,g2>  0, 
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(2)  / gi(xi)g2(x2W^2(dxu  dx2)  exists, 

(3)  / j  =  1,2,  exwf, 

then 


J  gi{xi)g2{x2)¥^2{dx\,dx2)  =  J  g\{x\)¥ ^{dx\)  J  g2(x2)¥s=2(dx2).  (3.6.9) 

To  avoid  trivial  complications,  we  assume  that  P(g;(§/)  =  0)  ^  1,  j  =  1,  2. 


Proof  Under  any  of  the  first  two  conditions,  the  assertion  of  the  corollary  follows 
immediately  from  Fubini’s  theorem.  For  arbitrary  g\,  g2,  put  gj  =  g+  —  gj ,  g±  >  0, 

j  =  1,2.  If  f  g±d¥^  <  oo  (we  will  use  here  the  abridged  notation  for  integrals), 
then 


J  gig2d¥ ?1  d¥^2  =  J g+g+dF^  d¥^2  —  j  gfg~d¥^  dF^ 

~  J  gig2  d¥^  JFfc  +  j  gig2  d¥^  JFfc 

=  j  st  d¥h  J  g+  d¥k  -  J  g+  d¥^  J  g£  dF^ 

~  J  gi  d¥^  J  g+  dF^  +  J  g~  d¥Hl  J  g~  d¥h 

=  J  gld¥^J  g2d¥,2. 


□ 


Corollary  3.6.2  In  the  special  case  when  g(x  \,X2)  =  Ib(x  1,-^2)  is  the  indicator 
of  a  set  B  e  932,  we  obtain  the  formula  for  sequential  computation  of  the  measure 

of  B: 


P(($l,fe  )€B 


§2)  €  fi)F|j (^xi). 


The  probability  of  the  event  {(x\ ,  §2)  €  5}  could  also  be  written  as  P(§2  £  BXl )  = 
P^2  (^xi )  where  BXl  =  {X2  :  (vi ,  *2)  e  5}  is  the  “section”  of  the  set  B  at  the  point  x\ . 
If  5  =  {(vi ,  xf)  '•  x\  +  X2  <  x},  we  get 

P((£i,  £2)  eB)  =  P(fi  +  §2  <  *)  =  F?1+?2(x) 

=  J  P(xi+^2<x)¥^(dxi) 

=  J  F^2{x  —  x\)dF^(x\).  (3.6.10) 

We  have  obtained  a  formula  for  the  distribution  function  of  the  sum  of  independent 
random  variables  expressing  F^+^2  in  terms  of  and  F%2.  The  integral  on  the 
right-hand  side  of  (3.6.10)  is  called  the  convolution  of  the  distribution  functions 
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F ^  (v)  and  F%2(x)  and  is  denoted  by  F^  *  F^2(x).  In  the  same  way  one  can  obtain 
the  equality 

/oo 

F>,  (x  - 

-OO 

Observe  that  the  right-hand  side  here  could  also  be  considered  as  a  result  of  inte¬ 
grating 


j  dF^(f)Fh 


(x-t) 


by  parts. 

If  at  least  one  of  the  distribution  functions  has  a  density,  the  convolution  also 
has  a  density.  This  follows  immediately  from  the  formulas  for  convolution.  Let,  for 
instance. 


(x)=  f  h 

J —oo 


( u )  du. 


Then 


/OO  pX 

Ffi  (dt)  fe2(u-t)du 
-OO  J  —OO 

=  f  (f  (dt)f^2(u  —  f) )  du, 

J —oo  \  J —oo  J 

so  that  the  density  of  the  sum  +  §2  equals 

/oo  poo 

F ^(dt)fy2(x  -t)  =  /  fh(x-t)dFh(t). 

-OO  J  —OO 

Example  3.6.1  Let  £i>§2>---  be  independent  random  variables  uniformly  dis¬ 
tributed  over  [0,  1],  i.e.  §1,  §2,  •  •  •  have  the  same  distribution  function  with  density 

1,  *€[0,1], 

0,  x  £  [0,  1]. 


fix)  =  | 

Then  the  density  of  the  sum  §1  +  §2  is 


(3.6.11) 


ft i +&(*)=  f 

Jo 


0, 


x  <£  [0,  2], 

^  e  [0, 1], 


=  /  f(x  —  t)  dt  =  <  x, 

0  2  —  x,  x  e  [1, 2] 


(3.6.12) 


The  integral  present  here  is  clearly  the  length  of  the  intersection  of  the  segments 
[0,  1]  and  [x  —  1,  *].  The  graph  of  the  density  of  the  sum  §1  +  §2  +  §3  will  consist 
of  three  pieces  of  parabolas: 

0,  *£[0,3], 


ft;  1 +£2 3  )  /  fk  1+^2^  t)  dt  \ 

do 


x 


2  ,  V  G  [0,  1], 

(2—x)2  (x  — l)2 


1  - 


(3-xb 


2  ^  X  E  [1,2], 

x  g  [2,  3]. 
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Fig.  3.2  Illustration  to  Example  3.6.1.  The  upper  row  visualizes  the  computation  of  the  convolu¬ 
tion  integral  for  the  density  of  +  £2  +  £3.  The  lower  row  displays  the  densities  of  £1,  £1  +  £2* 
and  fi  +  §2  +  £3,  respectively 


The  computation  of  this  integral  is  visualised  in  Fig.  3.2,  where  the  shaded  areas 
correspond  to  the  values  of  /|1+^2+^3  (jt)  for  different  x.  The  shape  of  the  densities 
of  §1,  +  §2  and  §1  +  §2  +  §3  is  shown  in  Fig.  3.2b.  The  graph  of  the  density  of  the 

sum  §1  +  §2  +  §3  +  §4  will  consist  of  four  pieces  of  cubic  parabolas  and  so  on.  If 
we  shift  the  origin  to  the  point  n/ 2,  then,  as  n  increases,  the  shape  (up  to  a  scaling 

transformation)  of  the  density  of  the  sum  §1  +  •  •  •  +  will  be  closer  and  closer  to 

_  2 

that  of  the  function  e~x  .  We  will  see  below  that  this  is  not  due  to  chance. 

In  connection  with  this  example  we  could  note  that  if  §  and  rj  are  two  independent 
random  variables,  §  having  the  distribution  function  F(x)  and  rj  being  uniformly 
distributed  over  [0,  1],  then  the  density  of  the  sum  §  +  77  at  the  point  v  is  equal  to 

p+n(x)  =  J  dF(t)fv(x  - 1)  =  J  dF(t )  =  F(x)  -  F(x  -  1). 


Chapter  4 

Numerical  Characteristics  of  Random  Variables 


Abstract  This  chapter  opens  with  Sect.  4.1  introducing  the  concept  of  the  expec¬ 
tation  of  random  variable  as  the  respective  Lebesgue  integral  and  deriving  its  key 
properties,  illustrated  by  a  number  of  examples.  Then  the  concepts  of  conditional 
distribution  functions  and  conditional  expectations  given  an  event  are  presented  and 
discussed  in  detail  in  Sect.  4.2,  one  of  the  illustrations  introducing  the  ruin  problem 
for  the  simple  random  walk.  In  the  Sects.  4.3  and  4.4,  expectations  of  independent 
random  variables  and  those  of  sums  of  random  numbers  of  random  variables  are 
considered.  In  Sect.  4.5,  Kolmogorov-Prokhorov’s  theorem  is  proved  for  the  case 
when  the  number  of  random  terms  in  the  sum  is  independent  of  the  future,  fol¬ 
lowed  by  the  derivation  of  Wald’s  identity.  After  that,  moments  of  higher  orders 
are  introduced  and  discussed,  starting  with  the  variance  in  Sect.  4.5  and  proceeding 
to  covariance  and  correlation  coefficient  and  their  key  properties  in  Sect.  4.6.  Sec¬ 
tion  4.7  is  devoted  to  the  fundamental  moment  inequalities:  Cauchy-Bunjakovsky’s 
inequality  (a.k.a.  Cauchy-Schwarz  inequality),  Holder’s  and  Jensen’s  inequalities, 
followed  by  inequalities  for  probabilities  (Markov’s  and  Chebyshev’s  inequalities). 
Section  4.8  extends  the  concept  of  conditional  expectation  (given  a  random  variable 
or  sigma-algebra),  starting  with  the  discrete  case,  then  turning  to  square-integrable 
random  variables  and  using  projections,  and  finally  considering  the  general  case 
basing  on  the  Radon-Nykodim  theorem  (proved  in  Appendix  3).  The  properties  of 
the  conditional  expectation  are  studied,  following  by  introducing  the  concept  of  con¬ 
ditional  distribution  given  a  random  variable  and  illustrating  it  by  several  examples 
in  Sect.  4.9. 


4.1  Expectation 

Definition  4.1.1  The  (i mathematical )  expectation ,  or  mean  value ,  of  a  random  vari¬ 
able  §  given  on  a  probability  space  (£?,  P)  is  defined  as  the  quantity 

E$=  I  %((o)F(dco). 

JC2 

Let  ^  =  max(0,  ±§).  The  values  >  0  are  always  well  defined  (see  Ap¬ 
pendix  3).  We  will  say  that  E§  exists  if  max(E§+,  E§_)  <  oo. 
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We  will  say  that  E§  is  well  defined  if  min(E§+,E§  )  <  oo.  In  this  case  the 
difference  E§+  —  E§_  is  always  well  defined,  but  E§  =  E§+  —  E§_  may  be  ±00. 


By  virtue  of  the  above  remarks  (see  Sect.  3.6)  one  can  also  define  E§  as 


J  x¥^(dx)  =  J 


=  I  xdF(x), 


(4.1.1) 


where  F(x)  is  the  distribution  function  of  §.  It  follows  from  the  definition  that  E§ 
exists  if  E |£|  <  00.  It  is  not  hard  to  see  that  E§  does  not  exist  if,  for  instance, 
1  -  F(x)  >  l/x  for  all  sufficiently  large  x . 

We  already  know  that  if  F(x)  is  a  step  function  then  the  Stieltjes  integral  (4.1.1) 
becomes  the  sum 


E£  :=  =xk). 

k 


If  F(x)  has  a  density  /(x),  then 


J  xf(x)dx, 


so  that  E§  is  the  point  of  the  “centre  of  gravity”  of  the  distribution  F  of  the  unit 
mass  on  the  real  line  and  corresponds  to  the  natural  interpretation  of  the  mean  value 
of  the  distribution. 

If  g(x)  is  a  Borel  function,  then  r\  =  g(§)  is  again  a  random  variable  and 


E*($)  =  j  g{^(w))P(dw)  =  /  £  (x )  d  F  (x )  J'  x  d  Fg  ^ )  (x ) 


The  last  equality  follows  from  definition  (4.1.1). 

The  basic  properties  of  expectations  coincide  with  those  of  the  integral: 


El.  If  a  and  b  are  constants ,  then  E (a  +  b%)  =  a  +  Z?E§. 

E2.  E(£i  +  §2)  =  E(£i)  +  E(§2)»  tf  any  two  of  the  expectations  appearing  in  the 
formula  exist. 

E3.  If  a  <  §  <  b,  then  a  <  E§  <  b.  The  inequality  E £  <  E|§|  always  holds. 

E4.  //$  >  0  and  E§  =  0,  then  §  =  0  with  probability  1 . 

E5.  The  probability  of  an  event  A  can  be  expressed  in  terms  of  expectations  as 

P(A)=EI(A), 

where  1(A)  is  the  random  variable  equal  to  the  indicator  of  the  event  A: 
1(A)  =  1  if  co  e  A  and  1(A)  =  0  otherwise. 


For  further  properties  of  expectations,  see  Appendix  3. 
We  consider  several  examples. 


Example  4.1.1  Expectations  related  to  the  Bernoulli  scheme.  Let  §  €=  B^,  i.e.  § 
assumes  two  values:  0  with  probability  q  and  1  with  probability  p,  where  p  +  q  =  1. 
Then 


E§  =  0  x  P(£  =  0)  +  1  x  P(£  =  1)  =  p. 
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Now  consider  a  sequence  of  trials  in  the  Bernoulli  scheme  until  the  time  of 
the  first  “success”.  In  other  words,  consider  a  sequence  of  independent  variables 
,  §2,  •  •  •  distributed  as  §  until  the  time 

ij  :=  min{/:  >1:^  =  1}. 


It  is  evident  that  r\  is  a  random  variable, 

P  (rj  =k)  =qk~l  p,  k>  1, 


so  that  r)  —  1  has  the  geometric  distribution.  Consequently, 


oo 

Er]  =  ^kqk~lp  = 
k=l 


P 

(1  ~q)2 


1 

P' 


If  we  put  Sn  :=  then  clearly  ESn  =  np.  Now  define,  for  an  integer  N  >\, 

the  random  variable  r\  =  min{k  >  1  :  Sk  =  N]  as  the  “first  passage  time”  of  level  N 
by  the  sequence  Sn .  One  has 


F(n  =  k)  =  P(sk-i  =  N-i)p, 


E?]  =  p 


oo 

y]  k(k  -  1)  •  •  •  (k  -  N  +  1  )qk~N . 

k=N 


The  sum  here  is  equal  to  the  A-th  derivative  of  the  function  =  X!o°  ^  ~ 
1/(1  —  z)  at  the  point  z  =  q,  i.e.  it  equals  N\/ pN+l .  Thus  E77  =  N/p.  As  we  will 
see  below,  this  equality  could  be  obtained  as  an  obvious  consequence  of  the  results 
of  Sect.  4.4. 


Example  4.1.2  If  §  €=  4>a  0i  then 


-I 


E£  =  /  t(j>a^2(t)dt  = 


■/d 


fht  J 


( t—a ) 


o/2n 

2 


_  (t—a)1 

e  2a2  dt 


o\/2 

=  —[ 
G\phi  J 


(t  —  a)e  2a2  dt  + 


a 


Go/l-R 


/ 


_  (t— ar 

e  2a2  dt 


ze  2a2  dz  +  a  =  a. 

Thus  the  parameter  a  of  the  normal  law  is  equal  to  the  expectation  of  the  latter. 


Example  4.1.3  If  §  €=  II then  E§  =  /x.  Indeed 


OO  u 


k= 0 


&=1 


=  /x, 


Example  4.1.4  If  § 


U0, 1 ,  then 
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It  follows  from  property  El  that,  for  §  €=  Va one  has 

b  —  a  a  +  b 

E§  =  a  - 1- - = - . 

2  2 

If  §  Ko,  i  then  the  expectation  E§  does  not  exist.  That  follows  from  the  fact 
that  the  integral  f  diverges. 


Example  4.1.5  We  now  consider  an  example  that  is,  in  a  sense,  close  to  Exam¬ 
ple  4.1.1  on  the  computation  of  E 77,  but  which  is  more  complex  and  corresponds 
to  computing  the  mean  value  of  the  duration  of  a  chess  tournament  in  a  real-life 
situation.  In  Sect.  3.4  we  described  a  simple  probabilistic  model  of  a  chess  tourna¬ 
ment.  The  first  player  wins  in  a  given  game,  independently  of  the  outcomes  of  the 
previous  games,  with  probability  p ,  loses  with  probability  q,  p  -\-q  <  1,  and  makes 
a  tie  with  probability  l  —  p  —  q.  Of  course,  this  is  a  rather  rough  first  approximation 
since  in  a  real-life  tournament  there  is  apparently  no  independence.  On  the  other 
hand,  it  is  rather  unlikely  that,  for  balanced  high  level  players,  the  above  probabili¬ 
ties  would  substantially  vary  from  game  to  game  or  depend  on  the  outcomes  of  their 
previous  results.  A  more  complex  model  incorporating  dependence  of  p  and  q  of 
the  outcomes  of  previous  games  will  be  considered  in  Example  13.4.2. 

Assume  that  the  tournament  continues  until  one  of  the  two  participants  wins  N 
games  (then  this  player  will  be  declared  the  winner).  For  instance,  the  1984  individ¬ 
ual  World  Championship  match  between  A.  Karpov  and  G.  Kasparov  was  organised 
just  according  to  this  scheme  with  N  =  6.  What  can  one  say  about  the  expectation 
Eq  of  the  duration  r\  of  the  tournament? 

As  was  shown  in  Example  3.3.1, 

N- 1  N- 1 

P (77  =  n)  =  P^^  P(n  —  UN  —  1  ,i)  +  q^2  pin  -  1;  i,  N  -  1), 

i =0  i =0 


where 


p(n\ i,j )  = 


n\ 


i\j\(n  -  i  -  j)\ 


p'qj  (1  —  p  —  q)n  1  J 


Therefore,  under  obvious  conventions  on  the  summation  indices, 


1  N~l  nNn '  4-  n‘aN  N~l 

E/7 = — - — y p  q  +pq 

(N  —  1)!  “  i! 

i=()  /?=() 

, n—i—N 


y  n(n  -  1)  x 


x  {n  —  i  —  N  +  1)  ( 1  —  p  —  q) 

The  sum  over  n  was  calculated  in  Example  4.1.1  to  be  (A  +  /)!/(/?  +  q)NJrlJrl 
Consequently, 

N-1  1  „i„N' 


E?]  = 


N 


E 


ip  ql  +  pl q  )iN  +  /)! 


P  +  q  ^  i\N\ip  +  q)l+N 


N 


p  +  q 


E  -rY+Zd  -r)*]. 

n= 0  ^  1  J 


where  r  =  p/ ip  +  q). 
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In  his  interview  of  3  March  1985  to  the  newspaper  “Izvestija”,  Karpov  said 
that  in  qualifying  tournaments  he  would  lose,  on  average,  1  game  out  of  20,  and 
that  Kasparov’s  results  were  similar.  If  we  put  in  our  simple  model  p  =  q  =  1/20 
(strictly  speaking,  one  cannot  make  such  a  conclusion  from  the  given  data;  the  rela¬ 
tion  p  =  q  =  1  /20  should  be  considered  rather  as  one  of  many  possible  hypotheses) 
then,  for  N  =  6,  direct  calculations  show  that  (r  =  1  /2) 


15 

1 

<  5 

n\l 

oo 

1  +2l| 

v1  +  B  + 

16/  J 

Thus,  provided  that  our  simplest  model  is  adequate,  the  expected  duration  of 
a  tournament  turns  out  to  be  very  large.  The  fact  that  the  match  between  Karpov 
and  Kasparov  was  interrupted  by  the  decision  of  the  chairman  of  the  World  Chess 
Federation  after  48  games  because  the  match  had  dragged  on,  might  serve  as  a 
confirmation  of  the  correctness  of  the  assumptions  we  made. 

Taking  into  account  the  results  of  the  match  and  consequent  games  between  Kar¬ 
pov  and  Kasparov  could  lead  to  estimates  (approximate  values)  for  the  quantities  p 
and  q  that  would  differ  from  1/20. 

For  our  model,  one  also  has  the  following  simple  inequality: 

N  IN  —  1 

- <  E  q  < - . 

p+q  p+q 

It  follows  from  the  relation  q^  <  q  <  q2N-\,  where  q^  is  the  number  of  games  until 
the  time  when  the  total  of  the  points  gained  by  both  players  reaches  N .  By  virtue  of 
Example  4.1.1,  =  N/(p  +  q). 


Example  4.1.6  In  the  problem  on  cells  in  Sects.  1.3  and  1.4,  we  considered  the 
probability  that  at  least  one  of  the  n  cells  in  which  r  particles  are  placed  at  random 
is  empty.  Find  the  expectation  of  the  number  Snj  of  empty  cells  after  r  particles 
have  been  placed.  If  Ak  denotes  the  event  that  the  k- th  cell  is  empty  and  I (A&)  is  the 
indicator  of  this  event  then 

n  n 

Sn,r  =  J2 1(A*),  ESn,r  =  J2p(Ak)=n 
1  1 

Note  now  that  E Snj  is  close  to  0  if  (1  —  1  /n)r  is  small  compared  with  l/n,  i.e. 
when  —  r  ln(l  —  l/n)  —  Inn  is  large.  For  large  n , 


and  the  required  relation  will  hold  if  (r  —  nlnn)/n  is  large.  In  our  case  (cf.  prop¬ 
erty  E4),  the  smallness  of  E Snj  will  clearly  imply  that  of  P (A)  =  P (Sn  r  >  0). 
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4.2  Conditional  Distribution  Functions  and  Conditional 
Expectations 

Let  {£2,  P)  be  a  probability  space  and  B  e  $  be  an  event  with  P (B)  >  0.  Form  a 
new  probability  space  (£2,  Pb),  where  P#  is  defined  for  A  e  $  by  the  equality 

P*(A):=P(A|5). 

It  is  easy  to  verify  that  the  probability  properties  PI,  P2  and  P3  hold  for  P#.  Let 
§  be  a  random  variable  on  (£2,  P).  It  is  clearly  a  random  variable  on  the  space 

(£2,$,Pb)  as  well. 


Definition  4.2.1  The  expectation  of  §  in  the  space  (£2,  Pb)  is  called  the  condi¬ 
tional  expectation  of  §  given  B  and  is  denoted  by  E(§  |5): 


/ 


E($| B)=  /  $(a>)PB(da>). 


By  the  definition  of  the  measure  P#, 

E(f|£)  =  J  %(co)P(dco\B)  =  j  %(a))P(da)  H  B)  =  jfoj)P(dco). 

The  last  integral  differs  from  E§  in  that  the  integration  in  it  is  carried  over  the  set  B 
only.  We  will  denote  this  integral  by 

E (£;£):=  f  £(a>)P(A»), 

JB 


so  that 


E($|  B)  = 


1 


P  (B) 


E  (£;*). 


It  is  not  hard  to  see  that  the  function 


F(x\B)  :=P b(&  <x)  =  P(£  <x\B) 
is  the  distribution  function  of  the  random  variable  §  on  (£2 ,$,PB} 


Definition  4.2.2  The  function  F(x  \  B)  is  called  the  conditional  distribution  function 
of  §  (in  the  “conditional”  space  {£2,$,Pb))  given  B. 

The  quantity  E(§  \B)  can  evidently  be  rewritten  as 

J  xdF(x\B). 

If  the  a -algebra  o  (§)  generated  by  the  random  variable  §  does  not  depend  on  the 
event  B ,  then  P#(A)  =  P(A)  for  any  A  e  cr(§).  Therefore,  in  that  case 


F(x\B)  =  Fix),  E(£| B)  =  E$,  E(£;  B)  =  P(5)E$.  (4.2.1) 


4.2  Conditional  Distribution  Functions  and  Conditional  Expectations 
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Let  { Bn }  be  a  (possibly  finite)  sequence  of  disjoint  events  such  that  |J  Bn  =  Q  and 
P (Bn)  >  0  for  any  n.  Then 

E §  =  /  £(o>)P(<fo)  =  V  /  £(<w)P(</a>) 

J  £2  n  B n 

=  y]E(|;  Bn)  =  (4.2.2) 

n  n 

We  have  obtained  the  total  probability  formula  for  expectations.  This  formula  can 
be  rather  useful. 

Example  4.2. 1  Let  the  lifetime  of  a  device  be  a  random  variable  §  with  a  distribution 
function  F(x).  We  know  that  the  device  has  already  worked  for  a  units  of  time. 
What  is  the  distribution  of  the  residual  lifetime?  What  is  the  expectation  of  the 
latter? 

Clearly,  in  this  problem  we  have  to  find  P(§  —  a  >  x |§  >  a)  and  E(§  —  a |§  >  a). 
Of  course,  it  is  assumed  that 

P(a)  :=  P(§  >  a)  >  0. 

By  the  above  formulae, 

p (x  -i -  a}  1  poo 

P(?  ~a  >x\%  >a)  =  —  ,  E(£ -a|f  >a)  =  — —  /  xdF(x  +  a). 

r(a)  r\a)  J0 

It  is  interesting  to  note  the  following.  In  many  applied  problems,  especially  when 
one  deals  with  the  operation  of  complex  devices  consisting  of  a  large  number  of 
reliable  parts,  the  distribution  of  §  can  be  assumed  to  be  exponential : 

P(x)  =  P(£  >  x)  =  e~^\  p  >  0. 

(The  reason  for  this  will  become  clear  later,  when  considering  the  Poisson  theorem 
and  Poisson  process.  Computers  could  serve  as  examples  of  such  devices.)  But,  for 
the  exponential  distribution,  it  turns  out  that  the  residual  lifetime  distribution 

P (x  a} 

P(£  -a  >x\%  >a)  =  — - -=e-^x  =  P(x)  (4.2.3) 

P(a) 

coincides  with  the  lifetime  distribution  of  a  new  device.  In  other  words,  a  new  de¬ 
vice  and  a  device  which  has  already  worked  without  malfunction  for  some  time 
a  are,  from  the  viewpoint  of  their  future  failure-free  operation  time  distributions, 
equivalent. 

It  is  not  hard  to  understand  that  the  exponential  distribution  (along  with  its  dis¬ 
crete  analogue  P(§  =  k)  =  qk{  1  —  q),  k  =  0,  1, . . .)  is  the  only  distribution  pos¬ 
sessing  the  above  remarkable  property.  One  can  see  that,  from  equality  (4.2.3),  we 
necessarily  have 

P(x  +  a)  =  P{x)P(a). 

Example  4.2.2  Assume  that  n  machines  are  positioned  so  that  the  distance  between 
the  i- th  and  j-th  machines  is  aij,  1  <  /,  j  <  n.  Each  machine  requires  service  from 
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time  to  time  (tuning,  repair,  etc.).  Assume  that  the  service  is  to  be  done  by  a  single 
worker  and  that  the  probability  that  a  given  new  call  for  service  comes  from  the 
j-th  machine  is  pj  (Y^j= 1  Pj  =  !)•  If,  for  instance,  the  worker  has  just  completed 
servicing  the  i  -th  machine,  then  with  probability  pj  (not  depending  on  i )  the  next 
machine  to  be  served  will  be  the  j-th  machine;  the  worker  will  then  need  to  go  to  it 
and  cover  a  distance  of  a\j  units.  What  is  the  mean  length  of  such  a  passage? 

Let  Bi  denote  the  event  that  the  i  -th  machine  was  serviced  immediately  before  a 
given  passage.  Then  P (Bi)  =  pi ,  and  the  probability  that  the  worker  will  move  from 
the  i- th  machine  to  the  j-th  machine,  j  =  1, . . . ,  n,  is  equal  to  pj.  The  length  §  of 
the  passage  is  aij .  Hence 

n 

E(§  I  Bi)  =  Pjai,j  ’ 

7=1 

and  by  the  total  probability  formula 


n  n 

Ef  =  pjPidij. 

i= 1  *',7=1 

The  obtained  expression  enables  one  to  compare  different  variants  of  positioning 
machines  from  the  point  of  view  of  minimisation  of  the  quantity  E§  under  given 
restrictions  on  aij .  For  instance,  if  atj  >  1  and  all  the  machines  are  of  the  same  type 
( pj  =  1  /n)  then,  provided  they  are  positioned  along  a  straight  line  (with  unit  steps 
between  them),  one  gets  =  \j  —  i\  and 


— r  y  k(n  -  k) 

n2h 


1 

1  +  - 
n 


so  that,  for  large  n ,  the  value  of  E§  is  close  to  n/3.  Thus,  if  there  are  s  calls  a  day 
then  the  average  total  distance  covered  daily  by  the  worker  is  approximately  sn/ 3. 
It  is  easy  to  show  that  positioning  machines  around  a  circle  would  be  better  but  still 
not  optimal. 


Example  4.2.3  As  was  already  noticed,  not  all  random  variables  (distributions)  have 
expectations.  The  respective  examples  are  by  no  means  pathological:  for  instance, 
the  Cauchy  distribution  Ka  a  has  this  property.  Now  we  will  consider  a  problem  on 
random  walks  in  which  there  also  arise  random  variables  having  no  expectations. 
This  is  the  problem  on  the  so-called  fair  game.  Two  gamblers  take  part  in  the  game. 
The  initial  capital  of  the  first  gambler  is  z  units.  This  gambler  wins  or  loses  each 


!To  compute  the  sum,  it  suffices  to  note  that 


"  1  i 

k(k  —  \)  =  -  (n  —  2  ){n  —  1  )n 
k= l  3 


(compare  the  initial  values  and  increments  of  the  both  sides). 
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play  of  the  game  with  probability  1/2  independently  of  the  outcomes  of  the  previous 
plays,  his  capital  increasing  or  decreasing  by  one  unit,  respectively.  Let  z  +  S*  be 
the  capital  of  the  first  gambler  after  the  k-th  play,  77  (z)  is  the  number  of  steps  until 
his  ruin  in  the  game  versus  an  infinitely  rich  adversary,  i.e. 

rj(z)  =  min{k  :  z  +  Sk  =  0},  rj{ 0)  =  0. 

If  inf*  Sk  >  —  z  (i.e.  the  first  gambler  is  never  ruined),  we  put  r)(z)  =  o o. 

First  we  show  that  77  (z)  is  a  proper  random  variable,  i.e.  a  random  variable 
assuming  finite  values  with  probability  1 .  For  the  first  gambler,  this  will  mean  that 
he  goes  bankrupt  with  probability  1  whatever  his  initial  capital  z  is.  Here  one  could 
take  Q  to  be  the  “sample”  space  consisting  of  all  possible  sequences  made  up  of 
1  and  —1.  Each  such  sequence  co  would  describe  a  “trajectory”  of  the  game.  (For 
example,  —  1  in  the  k-th  place  means  that  the  first  gambler  lost  the  k-th  play.)  We 
leave  it  to  the  reader  as  an  exercise  to  complete  the  construction  of  the  probability 
space  (Q,  P).  Clearly,  one  has  to  do  this  so  that  the  probability  of  any  first  n 

outcomes  of  the  game  (the  first  n  components  of  co  are  fixed)  is  equal  to  2~n . 

Put 


u(z)  :=  P (rj(z)  <  oo),  u( 0)  :=  1, 

and  denote  by  B\  the  event  that  the  first  component  of  co  is  1  (the  gambler  won  in 
the  first  play)  and  B 2  that  this  component  is  —  1  (the  gambler  lost).  Noticing  that 
P(7?(z)  <  oo\B\)  =  u(z  +  1)  (if  the  first  play  is  won,  the  capital  becomes  z  +  1),  we 
obtain  by  the  total  probability  formula  that,  for  z  >  1 , 


u(z)  =  P(Bi)P{rj(z)  <  00  |fll)+P(fl2)P(>?(z)  <  00  I B2) 

1  1 

=  zw(z  +  1)  +  r u(z  -  1). 

Putting  <$(z)  :=  u(z  +  1)  —  w(z),  z  >0,  we  conclude  from  here  that  <$(z)  — 
8(z  -  1)  =  0,  and  hence  <$(z)  =  <$  =  const.  Since 

z 

u(z  +  1)  =  m(  0)  +  ^5(k)  =  u(  0)  +  z8, 

k= 1 

it  is  evident  that  8  can  be  nothing  but  0,  so  that  u(z)  =  1  for  all  z. 

Thus,  in  a  game  against  an  infinitely  rich  adversary,  the  gambler  will  be  ruined 
with  probability  1.  This  explains,  to  some  extent,  the  fact  that  all  reckless  gamblers 
(not  stopping  “at  the  right  time”;  choosing  this  “right  time”  is  a  separate  problem) 
go  bankrupt  sooner  or  later.  Even  if  the  game  is  fair. 

We  show  now  that  although  rj(z)  is  a  proper  random  variable,  E^(z)  =  00.  As¬ 
sume  the  contrary: 

v(z)  :=  E*7(z)  <  00. 

Similarly  to  the  previous  argument,  we  notice  that  E(^(z)|^i)  =  1  +  v(z  +  1)  (the 
capital  became  z  +  1,  one  play  has  already  been  played).  Therefore  by  the  total 
probability  formula  we  find  for  z  >  1  that 

v(z)  =  1(1  +  v(z  +  1))  +  1(1  +  v(z  -  1)),  v(0)  =  0. 
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It  can  be  seen  from  this  formula  that  if  v(z)  <  oo,  then  v(k)  <  oo  for  all  k.  Set 
A(z)  =  v(z  +  1)  —  v(z).  Then  the  last  equality  can  be  written  down  for  z  >  1  as 

-l  =  ^A(z)-^A(z-l), 
or 

A(z)  =  A(z-  l)-2. 

From  this  equality  we  find  that  A(z)  =  A(0)  —  2z.  Therefore 

z — 1 

V (z)  =  ^  A(fc)  =  zA(0)  z(z  -  1)  =  zv(  1)  z(z  1). 

/:=0 

It  follows  that  E Tj(z)  <  0  for  sufficiently  large  z.  But  rj(z)  is  a  positive  random 
variable  and  hence  E ij(z)  >  0.  The  contradiction  shows  that  the  assumption  on  the 
finiteness  of  the  expectation  of  ij(z)  is  wrong. 


4.3  Expectations  of  Functions  of  Independent  Random  Variables 

Theorem  4.3.1 

1.  Let  §  and  r\  be  independent  random  variables  and  g(x,y)  be  a  Borel  function. 
Then  if  g  >0  or  Eg(§,  rj)  is  finite,  then 

Eg^,r1)  =  E[Eg(x,r1)\x^].  (4.3.1) 

2.  Let  g(x ,  y)  =  g\(x)g2{y).  If  gi(£)  >  0  and  g2(n )  >  0,  or  both  Egi(f)  and 
Eg2(r1)  exist ,  then 

EgQ9ri)  =  Eg1($)Eg2(ri).  (4.3.2) 

The  expectation  Eg(§,  r\ /)  exists  if  and  only  if  both  Egi(§)  and  Eg2{r])  exist.  (We 
exclude  here  the  trivial  cases  P(gi(§)  =  0)  =  1  and  F(g2(rj)  =  0)  =  1  to  avoid 
trivial  complications.) 


Proof  The  first  assertion  of  the  theorem  is  a  paraphrase  of  Fubini’s  theorem  in  terms 
of  expectations.  The  first  part  of  the  second  assertion  follows  immediately  from 
Corollary  3.6.1  of  Fubini’s  theorem.  Since  |gi(§)|  >0  and  \g2(r])\  >  0  and  these 
random  variables  are  independent,  one  has 


Egi(£)#20)  =E  gi($)  E  g2(rj) 


Now  the  last  assertion  of  the  theorem  follows  immediately,  for  one  clearly  has 

E|gi(§)|^0,E|g207)I^O.  □ 


Remark  4.3.1  Formula  (4.3.1)  could  be  considered  as  the  total  probability  formula 
for  computing  the  expectation  Eg(§,  rj).  Assertion  (4.3.2)  could  be  written  down 
without  loss  of  generality  in  the  form 


E^r/  =  E§Et7. 


(4.3.3) 


4.4  Expectations  of  Sums  of  a  Random  Number  of  Random  Variables 
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To  get  (4.3.2)  from  this,  one  has  to  take  gi(§)  instead  of  §  and  £2(7?)  instead  of 
77 — these  will  again  be  independent  random  variables. 

Examples  of  the  use  of  Theorem  4.3.1  were  given  in  Sect.  3.6  and  will  be  appear¬ 
ing  in  the  sequel. 

The  converse  to  (4.3.2)  or  (4.3.3)  does  not  hold.  There  exist  dependent  random 
variables  §  and  77  such  that 

=  E§Et7. 

Let,  for  instance,  f  and  §  be  independent  and  E§  =  E£  =  0.  Put  77  =  §£ .  Then  §  and 
77  are  clearly  dependent  (excluding  some  trivial  cases  when,  say,  §  =  const),  but 

E£  r)  =  Ef 2£  =  E|2E£  =  0  =  E^Eri. 


4.4  Expectations  of  Sums  of  a  Random  Number  of  Random 
Variables 

Assume  that  a  sequence  {^n}^L]  of  independent  random  variables  (or  random  vec¬ 
tors)  and  an  integer- valued  random  variable  v  >  0  are  given  on  a  probability  space 

<M,  P>. 

Property  E2  of  expectations  implies  that,  for  sums  Sn  =  Y^i= 1  the  following 

equality  holds: 

n 

E  S„  =  J2  E  & . 

Z  =  1 

In  particular,  if  =  E^  =  a  do  not  depend  on  k  then  E Sn  =  an. 

What  can  be  said  about  the  expectation  of  the  sum  sv  of  the  random  number  v 
of  random  variables  §1,  §2, • ■ To  answer  this  question  we  need  to  introduce  some 
new  notions. 

Let  ^k,n  :=  cr(£k, . .  • ,  $n)  he  the  or -algebra  generated  by  the  n  —  k  +  1  random 
variables 

Definition  4.4.1  A  random  variable  v  is  said  to  be  independent  of  the  future  if  the 
event  {v  <n]  does  not  depend  on  3«+i,oo- 

Let,  further,  a  family  of  embedded  a -algebras  $n  :  3ft  C  3ft+l  be  given,  such  that 

3l,ft  =  ®  (^l ,  •  •  • ,  £«)  C3ft- 

Definition  4.4.2  A  random  variable  v  is  said  to  be  a  Markov  (or  stopping )  time  with 
respect  to  the  family  {5>7},  if  {v  <  n}edn- 

Often  is  taken  to  be  3 1, ft  =  cr  (§1 ,  •  •  • ,  £«).  We  will  call  a  stopping  time  with  re¬ 
spect  to  3l,ft  simply  a  stopping  (or  Markov)  time  without  indicating  the  correspond¬ 
ing  family  of  o -algebras.  In  this  case,  knowing  the  values  of  §1, . . . ,  enables  us 
to  say  whether  the  event  {v  <  n]  has  occurred  or  not. 

If  the  are  independent  (the  a -algebras  3i,w  and  3«+i,oo  are  independent)  then 
the  requirement  of  independence  of  the  future  is  wider  than  the  Markov  property, 
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because  if  v  is  a  stopping  time  with  respect  to  {3i,/7}  then,  evidently,  the  random 
variable  v  does  not  depend  on  the  future. 

As  for  a  converse  statement,  one  can  only  assert  the  following.  If  v  does  not 
depend  on  the  future  and  the  £&  are  independent  then  one  can  construct  a  family  of 
embedded  a -algebras  {3«},  $n  D$i,n>  such  that  v  is  a  stopping  time  with  respect 
to  $n  ({v  <n}c$n)  and  does  not  depend  on  3h+i,oo-  As  -5n>  we  can  take  the  a- 
algebra  generated  by  3  m  and  the  events  {v  =  k}  for  k  <  n.  For  instance,  a  random 
variable  v  independent  of  {§7 }  clearly  does  not  depend  on  the  future,  but  is  not  a 
stopping  time.  Such  v  will  be  a  stopping  time  only  with  respect  to  the  family  {3« } 
constructed  above. 

It  should  be  noted  that,  formally,  any  random  variable  can  be  made  a  stopping 
time  using  the  above  construction  (but,  generally  speaking,  there  will  be  no  inde¬ 
pendence  of  $n  and  3W,>00).  However,  such  a  construction  is  unsubstantial  and  not 
particularly  useful.  In  all  the  examples  below  the  variables  v  not  depending  on  the 
future  are  stopping  times  defined  in  a  rather  natural  way. 

Example  4.4.1  Let  v  be  the  number  of  the  first  random  variable  in  the  sequence 
{£«}£Li  which  is  greater  than  or  equal  to  N ,  i.e.  v  =  inf{k  :  &  >  N}.  Clearly,  v  is  a 
stopping  time,  since 

n 

{v<n}=\J{^k>N}edi,n- 

k=  1 

If  §£  are  independent,  then  evidently  v  is  independent  of  the  future. 

The  same  can  be  said  about  the  random  variable 

k 

ri(t):=mm{k:  Sk>N},  Sk  =  ^2%j. 

7=1 

Note  that  the  random  variables  v  and  rj(t)  may  be  improper  (e.g.,  ij(t)  is  not  defined 
on  the  event  {S  :=  sup  Sk  <  N}).  The  random  variable  0  :=  min{k  :  Sk  =  S]  is  not  a 
stopping  time  and  depends  on  the  future. 

The  term  “Markov”  random  variable  (or  Markov  time)  will  become  clearer  after 
introducing  the  notion  of  Markovity  in  Chap.  13.  The  term  “stopping  time”  is  related 
to  the  nature  of  a  large  number  of  applied  problems  in  which  such  random  variables 
arise.  As  a  typical  example,  the  following  procedure  could  be  considered.  Let  be 
the  number  of  defective  items  in  the  k- th  lot  produced  by  a  factory.  Statistical  quality 
control  is  carried  out  as  follows.  The  whole  production  is  rejected  if,  in  sequential 
testing  of  the  lots,  it  turns  out  that,  for  some  n,  the  value  of  the  sum 

n 

k= 1 

exceeds  a  given  admissible  level  a  +  bn.  The  lot  number  v  for  which  this  happens, 

v  :=  min {n  :  Sn  >  a  +  bn], 
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is  a  stopping  time  for  the  whole  testing  procedure.  To  avoid  a  lengthy  testing,  one 
also  introduces  a  (literal)  stopping  time 

v*  :=  min{n  :  Sn  <  —  A  +  bn}, 

where  A  >  0  is  chosen  so  large  as  to  guarantee,  with  a  high  probability,  a  sufficient 
quality  level  for  the  whole  production  (assuming,  say,  that  the  ^  are  identically  dis¬ 
tributed).  It  is  clear  that  v  and  v*  both  satisfy  the  definition  of  a  Markov  or  stopping 
time. 


Consider  the  sum  Sv  =  ^ i  +  •  •  •  +  §v  of  a  random  number  of  random  variables. 
This  sum  is  also  called  a  stopped  sum  in  the  case  when  v  is  a  stopping  time. 


Theorem  4.4.1  (Kolmogorov-Prokhorov)  Let  an  integer-valued  random  variable  v 
be  independent  of  the  future.  If 


oo 

£P(v>*)E|&|<oo  (4.4.1) 

k=  1 

then 

oo 

ESv  =  y]P(v  >*)E&.  (4.4.2) 

k= 1 

//&  >  0  /7z£/2  condition  (4.4.1)  A  superfluous. 


Proof  The  summand  ^  is  present  in  the  sum  Sv  if  and  only  if  the  event  {v  >  k} 
occurs.  Thus  the  following  representation  holds  for  the  sum  Sv\ 

oo 

Sv  =  J2^kl(v>k), 
k=  1 

where  1(5)  is  the  indicator  of  the  event  B.  Put  Sv,n  :=  Jfk=\  %kf(v  >  k).  If  ^  >  n 
then  Sv>w  t  Sv  for  each  co  as  n  — >►  oo,  and  hence,  by  the  monotone  convergence 
theorem  (see  Theorem  A3. 3.1  in  Appendix  3), 


n 

ESy  =  lim  E Sv  n  ~  lim  V^E^I(v  >  k). 

o  ’  o 

k= 1 

But  the  event  {v  >  k}  complements  the  event  {v  <  k  —  1}  and  therefore  does  not 
depend  on  cr(§£,  , . . .)  and,  in  particular,  on  cr  (§&).  Hence,  putting  :=  E§£  we 

get  E^I(v  >k)=  ak :P(v  >  k),  and 
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OO 


lim  y'a*P(vi  >k)  =  /at  P(v  >  k) 

:^00 


£=1 


£=1 


This  proves  (4.4.2)  for  ^  >  0. 

Now  assume  ^  can  take  values  of  both  signs.  Put 


(4.4.3) 


at  :=  E£ 


* 

k 


k=  1 


Zv.»:=53§tl(v>fc). 

fc=l 
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Applying  (4.4.3),  we  obtain  by  virtue  of  (4.4.1)  that 

oo 

EZy  =  ^~^a|P(v  >  k)  <  oo. 
k= l 

Since  |5V>W|  <  Zv,n  <  Zv,  by  the  monotone  convergence  theorem  (see  Corol¬ 
lary  6.1.3  or  (the  Fatou-Lebesgue)  Theorem  A3. 3. 2  in  Appendix  3)  we  have 

E Sv  =  lim  E SVfn  =  T  ak P(v  >  k), 

where  the  series  on  the  right-hand  side  absolutely  converges  by  virtue  of  (4.4.1). 
The  theorem  is  proved.  □ 

Put 


a*  :=  ma xak,  a*  :=  min ak, 

where,  as  above,  ak  =  E %k. 


Theorem  4.4.2  Let  sup/cE|^|  <  oo  and  v  be  a  random  variable  which  does  not 
depend  on  the  future.  Then  the  following  assertions  hold  true. 

(a)  If  Ev  <  oo  (or  EZV  <  oo,  where  Zn  =  Ylk=i  l&l)  then  EVv  exists  and 

a* Ev  <  E Sv  <  a* Ev.  (4.4.4) 

(b)  IfESy  is  well  defined  (and  may  be  =boo),  a*  >  0  and,  for  any  N  >  1, 

E(Sn  —  a*N;  v  >  N)  <  c, 

where  c  does  not  depend  on  N ,  then  (4.4.4)  holds  true. 

(c)  //&  >  0  then  (4.4.4)  is  always  valid. 

IfSv>  const  a.s.  then  condition  (c)  clearly  implies  (b). 

The  case  a*  <  0  in  assertions  (b)-(c)  can  be  treated  in  exactly  the  same  way. 

If  v  does  not  depend  on  {^},  a*  =  a*  =  a  >  0,  then  E(S#;  v  >  N)  =  aN P(v  > 
N )  and  the  condition  in  (b)  holds.  But  the  assumption  a*  =  a*  is  inessential  here, 
and,  for  independent  v  and  {§ },  (4.4.4)  is  always  true,  since  in  this  case 

ESV  =  P(v  =  k)ESk  <a*J2  kV{v  =  k)  =  a* Ev. 

The  reverse  inequality  E Sv  >  a* Ev  is  verified  in  the  same  way. 


Proof  of  Theorem  4.4.2 
(a)  First  note  that 

oo  oo  oo  oo 

P(V  >  k)  =  E  P<v  =  0  =  E  *P(v  =  0  =  Ev. 

k= 1  k=  1  i=k  i  =  1 

Note  also  that,  for  E|^|  <  c  <  oo,  the  condition  Ev  <  oo  (or  EZy  <  oo)  turns  into 
condition  (4.4.1),  and  assertion  (4.4.4)  follows  from  (4.4.2).  Therefore,  if  Ev  <  oo 
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then  Theorem  4.4.2  is  a  direct  consequence  of  Theorem  4.4.1.  The  same  is  true  in 
case  (d). 

Consider  now  assertions  (b)  and  (c). 

For  a  fixed  N  >  0,  introduce  the  random  variable 

vN  :=  min(v,  N), 

which,  together  with  v,  does  not  depend  on  the  future.  Indeed,  if  n  <  N  then  the 
event  {v^  <  n}  =  {v  <  n]  does  not  depend  on  3w+i,oo-  If  ft  >  A  then  the  event 
{Lv  <  A}  is  certain  and  hence  it  too  does  not  depend  on  S«+i5oo- 

(b)  If  Ev  <  oo  then  (4.4.4)  is  proved.  Now  let  Ev  =  oo.  We  have  to  prove  that 
E Sv  =  oo.  Since  Ev^  <  oo,  the  relations 

E SVN  =  E (Sv;  v  <  A)  +  E (Sn;  v  >  A)  >  E(v;  v  <  A)  +  AP(v  >  A))  (4.4.5) 
are  valid  by  (a).  Together  with  the  conditions  in  (b)  this  implies  that 

E(5V;  v<  A)>  a*E(v;  v  <  N)  —  c  — >  oo 
as  Af  — >  oo.  Since  Sv  is  well  defined,  we  have 

E(Sv‘v<N)^ESv 

as  — >  oo  (see  Corollary  A3. 2.1  in  Appendix  3).  Therefore  necessarily  ESV  =  oo. 

(c)  Here  it  is  again  sufficient  to  show  that  ESV  =  oo  in  the  case  when  Ev  =  oo.  It 
follows  from  (4.4.5)  that 

ESV  =  E(SV;  v  <  N)  +  E(SV;  v  >  N) 

>  E[NV  —  (Sn  —  a*N)\  v  >  A]  +  a* E(v;  v  <  N)  >  a*E(v\  v  <  N)  —  c 

as  N  — >  oo,  and  thus  ENV  =  oo. 

The  theorem  is  proved. 

Theorem  4.4.2  implies  the  following  famous  result. 

Theorem  4.4.3  (Wald’s  identity)  Assume  a  =  E ^  does  not  depend  on  k , 
sup^  E|^|  <  oo,  and  a  random  variable  v  is  independent  of  the  future.  Then ,  under 
at  least  one  of  the  conditions  (a)-(d)  of  Theorem  4.4.2  ( with  a *  replaced  by  a ), 

ENy  =  aEv.  (4.4.6) 

If  a  =  0  and  Ev  =  oo  then  identity  (4.4.6)  can  hold,  since  there  would  be  an 
ambiguity  of  type  0  •  oo  on  the  right-hand  side  of  (4.4.6). 

Remark  4.4.1  If  there  is  no  independence  of  the  future  then  equality  (4.4.6)  is,  gen¬ 
erally  speaking,  untrue.  Let,  for  instance,  a  =  E^k  <0,0  :=  min {k  :  Sk  =  S}  and 
S  :=  sup/c  Sk  (see  Example  4.4.1;  see  Chaps.  10-12  for  conditions  of  finiteness  of 
ES  and  EG).  Then  Sq  =  S  >  0  and  ES  >  0,  while  aEO  <  0.  Hence,  (4.4.6)  cannot 
hold  true  for  v  —  0 . 

We  saw  that  if  there  is  no  assumption  on  the  finiteness  of  Ev  then,  even  in  the  case 
a  >  0,  in  order  for  (4.4.6)  to  hold,  additional  conditions  are  needed,  e.g.,  conditions 
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(b)-(d).  Without  these  conditions  identity  (4.4.6)  is,  generally  speaking,  not  valid, 
as  shown  by  the  following  example. 


Example  4.4.2  Let  the  random  variables  be  independent  and  identically  dis¬ 
tributed,  and 


E&  =  0,  E?*2  =  l,  E|£t|3  <  oo, 

:=  1  +  v  '■=  min{& :  5*  <  0}. 


We  will  show  below  in  Example  20.2.1  that  v  is  a  proper  random  variable,  i.e. 


P(v  <  oo )  =  P 


It  is  also  clear  that  v  is  a  Markov  time  independent  of  the  future  and  =  a  =  1 . 
But  one  has  E Sv  <  0,  while  aEv  >  0,  and  hence  equality  (4.4.6)  cannot  be  valid. 
(Here  necessarily  Ev  =  oo,  since  otherwise  condition  (a)  would  be  satisfied  and 
(4.4.6)  would  hold.) 

However,  if  the  are  independent  and  identically  distributed  and  v  is  a  stop¬ 
ping  time  then  statement  (4.4.6)  is  always  valid  whenever  its  right-hand  side  is  well 
defined.  We  will  show  this  below  in  Theorem  11.3.2  by  virtue  of  the  laws  of  large 
numbers. 

Conditions  (b)  and  (c)  in  Theorem  4.4.2  were  used  in  the  case  Ev  =  oo.  However, 
in  some  problems  these  conditions  can  be  used  to  prove  the  finiteness  of  Ev.  The 
following  example  confirms  this  observation. 


Example  4.4.3  Let  £i,  . . .  be  independent  and  identically  distributed  and  a  = 

E$i  >  0.  For  a  fixed  t  >  0,  consider,  as  a  stopping  time  (and  a  variable  independent 
of  the  future),  the  random  variable 

v  =  rj(t)  :=  min{k  :  Sk  >  t}. 

Clearly,  Sn  <  t  on  the  set  rj(t)  >  N  and  >  t.  Therefore  conditions  (b)  and  (c) 
are  satisfied,  and  hence 


E  Sr,(t)  =  aErj(t). 


We  now  show  that  E rj(t)  <  oo.  In  order  to  do  this,  we  consider  the  “trimmed” 
random  variables  :=  min (N,  §&)  and  choose  N  large  enough  for  the  inequality 
a(N)  >  0  to  hold  true.  Let  S ^  and  r](N\t)  be  defined  similarly  to  Sk  and 

rj(t ),  but  for  the  sequence  Then  evidently  S^n)^  <t  +  N ,  r)(t)  <r)^N\t), 


a^Erj^N\t)  <  t  +  N,  Er](t)  <  <  oo. 

If  a  =  0  then  Eij(t)  =  oo.  This  can  be  seen  from  the  fair  game  example  (§&  =  ±1 
with  probability  1  /2;  see  Example  4.2.3).  In  the  general  case,  this  will  be  shown  be¬ 
low  in  Chap.  12.  As  was  noted  above,  in  this  case  the  right-hand  side  of  (4.4.6)  turns 
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into  the  indeterminacy  0  •  oo,  but  the  left-hand  side  may  equal  any  finite  number,  as 
in  the  case  of  the  fair  game  where  =  t. 

If  we  take  v  to  be  the  Markov  time 


v  =  /i(t)  :=  minjk  :  \Sk\  >  t }, 

where  ^  may  assume  values  of  both  signs,  then,  to  prove  (4.4.6),  it  is  apparently 
easier  to  verify  the  condition  of  assertion  (a)  in  Theorem  4.4.2.  Let  us  show  that 
E/x(0  <  oo.  It  is  clear  that,  for  any  given  t  >  0,  there  exists  an  N  such  that 

q  :=  min[P(S#  >  2 1),  P (S#  <  — 2f)]  >  0. 

(N  =  1  if  the  are  bounded  from  below.)  For  such  N, 

inf  P(\v  +  Sn I  >  t)  >  2 q. 

\v\<t 


Hence,  in  each  N  steps,  the  random  walk  {Sk}  has  a  chance  to  leave  the  strip  \v\  <t 
with  probability  greater  than  2 q ,  whatever  point  v,  M  <  t,  it  starts  from.  Therefore, 

P (//(f)  >  kN )  =  p(max  | Sj |  <  t)  <  P^  <  /}^  <  (1  —  2 q)k . 

This  implies  that  P(/x(0  >  kN)  decreases  exponentially  as  k  grows  and  that  E p(t) 
is  finite. 


Example  4.4.4  A  chain  reaction  scheme.  Suppose  we  have  a  single  initial  particle 
which  either  disappears  with  probability  q  or  turns  into  m  similar  particles  with 
probability  p  =  1  —  q .  Each  particle  from  the  new  generation  behaves  in  the  same 
way  independently  of  the  fortunes  of  the  other  particles.  What  is  the  expectation  of 
the  number  t;n  of  particles  in  the  n- th  generation? 

Consider  the  “double  sequence”  of  independent  identically  dis¬ 

tributed  random  variables  assuming  the  values  m  and  0  with  probabilities  p  and  q , 
respectively.  The  sequences  •  •  •  will  clearly  be  mutually  inde¬ 

pendent.  Using  these  sequences,  one  can  represent  the  variables  t,n  (fo  =  1)  as 


K\ 


“Ho  “U  ’ 


(2) 

1 


(2) 

Ci 


in) 


?n=§r  +  -"+$ 


(n) 
Kn  —  1 


where  the  number  of  summands  in  the  equation  for  is  ,  the  number  of  “parent 
particles”.  Since  the  sequence  {^/7) }  is  independent  of  >  0,  and  E§^  = 

pm ,  by  virtue  of  Wald’s  identity  we  have 


E  =  pmE^n-i  =  (pm)n . 
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Example  4.4.5  We  return  to  the  fair  game  of  two  gamblers  described  in  Exam¬ 
ple  4.2.3,  but  now  assume  that  the  respective  capitals  z\  >  0  and  Z2  >  0  of  the 
gamblers  are  finite.  Introduce  random  variables  £&  representing  the  gains  of  the  first 
gambler  in  the  respective  (k-th)  play.  The  variables  ^  are  obviously  independent, 
and 


with  probability  1/2, 
with  probability  1/2. 


The  quantity  z\  +  Sk  =  zi  +  J2j= l  £/'  will  be  the  capital  of  the  first  gambler  and 
z 2  ~  5k  the  capital  of  the  second  gambler  after  k  plays.  The  quantity 


rj  :=  min {k  :  zi  +  S*  =  0  or  Z2  —  S*  =  0} 


is  the  time  until  the  end  of  the  game,  i.e.  until  the  ruin  of  one  of  the  gamblers.  The 
question  is  what  is  the  probability  Pf  that  the  i- th  gambler  wins  (for  i  =  1,  2)? 

Clearly,  r\  is  a  Markov  time,  =  —  z\  with  probability  Pz  and  5/  =  zi  with 
probability  P\  =  1  —  P2.  Therefore, 

E Sr/  =  P\Z2  ~  PlZl. 

If  E77  <  00,  then  by  Wald’s  identity  we  have 


P\Z2  ~  P2Z1  =  E^E^i  =  0. 


From  this  we  find  that  Pi  =  zi/(z  1  +  zi)- 

It  remains  to  verify  that  Erj  is  finite.  Let,  for  the  sake  of  simplicity,  zi  +  Z2  = 
2 z  be  even.  With  probability  2_min^1,Z2^  >  2~z,  the  game  can  be  completed  in 
min(zi,Z2)  <  z  plays.  Since  the  total  capital  of  both  players  remains  unchanged 
during  the  game, 

P(»?  >  z)  <  1  —  2~z,  ¥(r1>  Nz)<(l-2~Z)N. 

This  evidently  implies  the  finiteness  of 

00 

E?]  = 

k= 0 

We  will  now  give  a  less  trivial  example  of  a  random  variable  v  which  is  indepen¬ 
dent  of  the  future,  but  is  not  a  stopping  time. 


Example  4.4.6  Consider  two  mutually  independent  sequences  of  independent  posi¬ 
tive  random  variables  ,  §2,  •  •  •  and  fi ,  £2,  •  •  •>  such  that  F  and  fy  ^  G.  Further, 
consider  a  system  consisting  of  two  devices.  After  starting  the  system,  the  first  de¬ 
vice  operates  for  a  random  time  §1  after  which  it  breaks  down.  Then  the  second 
device  replaces  the  first  one  and  works  for  §2  time  units  (over  the  time  interval 
(§1 ,  +  §2))-  Immediately  after  the  first  device’s  breakdown,  one  starts  repairing  it, 

and  the  repair  time  is  £2-  If  £2  >  §2,  then  at  the  time  +  §2  of  the  second  device’s 
failure  both  devices  are  faulty  and  the  system  fails.  If  £2  <  §2,  then  at  the  time  §1  +  §2 
the  first  device  starts  working  again  and  works  for  §3  time  units,  while  the  second 
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device  will  be  under  repair  for  £3  time  units.  If  £3  >  §3,  the  system  fails.  If  £3  <  §3, 
the  second  device  will  start  working,  etc.  What  is  the  expectation  of  the  failure-free 
operation  time  r  of  the  system? 

Let  v  :=  min{k  >  2  :  >  §&}.  Then  clearly  r  =  §1  +  •  •  •  +  £v,  where  the 

are  independent  and  identically  distributed  and  {v  <  n]  e  cr(£i,  ft, fv). 

This  means  that  v  is  independent  of  the  future.  At  the  same  time,  if  fy  7^  const,  then 
{v  <  n}  £  ,n  =  cr(§i, . . . ,  §„)  and  v  is  not  a  Markov  time  with  respect  to 
Since  ^  >  0,  by  Wald’s  identity  Er  =  Ev  E£i .  Since 


fc-i 

{v=fe}=  P|{??y  <0ln(%  >  ?*}>  fe-2’ 

j= 2 


one  has  P(v  =  k)  =  qk  2(1  —  q),  k>  2,  where 

<7  =  P(«  <  £0  =  J  dF{t)  G(t  +  0). 

Consequently, 


00  00  1 

Ev  =  V  jfe9*-2(l  -  4)  =  1  +  V  -  q)  =  1  +  - - , 

L — 4  L — 4  1  —  q 

k= 2  fc=i  7 


Er  =  E§i 


2  —  q 
l-q 


Wald’s  identity  has  a  number  of  extensions  (we  will  discuss  these  in  more  detail 
in  Sects.  10.3  and  15.2). 


4.5  Variance 

We  introduce  one  more  numerical  characteristic  for  random  variables. 

Definition  4.5.1  The  variance  Var(§)  of  a  random  variable  §  is  the  quantity 

Var(£)  :=  E(£  —  E$ )2. 

It  is  a  measure  of  the  “dispersion”  or  “spread”  of  the  distribution  of  §.  The  vari¬ 
ance  is  equal  to  the  inertia  moment  of  the  distribution  of  unit  mass  along  the  line. 
We  have 

Var(£)  =  E£2  -  2E£E£  +  (E£)2  =  E£2  -  (E£)2.  (4.5.1) 

The  variance  could  also  be  defined  as  minfl  E(£  —  a)2.  Indeed,  by  that  definition 
Var(£ )  =  Ef 2  +  min  (a2  -  2aE£)  =  E£2  -  (E£)2, 

d 

since  the  minimum  of  a2  —  2rzE§  is  attained  at  the  point  a  =  E§.  This  remark  shows 
that  the  quantity  a  =  E§  is  the  best  mean  square  estimate  (approximation)  of  the 
random  variable  § . 

The  quantity  VVar(§ )  is  called  the  standard  deviation  of  § . 
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Example  4.5.1  Let  §  ^  <I>fl  ai.  As  we  saw  in  Example  4.1.2,  a  =  E§.  Therefore, 


Var(f)  =  (x-a) 


/ 


2  1  „\2n„2  .  <7 


O  \fl7l 


e~(x~a)  /la  dx  = 


s/2 


71 


/ '  V': 


^  dt, 


The  last  equality  was  obtained  by  the  variable  change  (x  —  a)/cr  =  t.  Integrating  by 
parts,  one  gets 


Var(§)  =  - 


a 


V2 


■te 


-E/2 


7T 


oo 


+ 


cr 


— oo 


V2 


71 


/2  =  a2, 


Example  4.5.2  Let  §  ^  II ^ .  In  Example  4. 1 .3  we  computed  the  expectation  E §  =  /x. 
Hence 

Var($ )  =  Ef 2  -  (Ef  )2  =  ^ 

/:=0 


oo 


=E 

k=2 


k(k  —  1  )pr  _ 

- e 

k\ 


oo 


/X 


+E 

£=0 


kpr 


/X 


—  \1  =  /X  +  /X  —  /X  =  /X. 


Example  4.5.3  For  § 


Uq,  1 ,  one  has 


Ef2 

By  (4.5.1)  we  obtain  Var(§) 


9  1 

x  dx  =  - 
3 


Example  4.5.4  For  §  ^  Bp,  by  virtue  of  the  relations  §2  =  §  and  E§2  =  E§  =  p  we 
obtain  Var(§)  =  p  —  p2  =  p(  1  —  /?). 

Consider  now  some  properties  of  the  variance. 

Dl.  Var(§)  2  0,  with  Var(§ )  —  0  if  and  only  if  P(^  —  c)  —  1,  where  c  is  a  constant 
(not  depending  on  co). 

The  first  assertion  is  obvious,  for  Var(§) 

P(§  =  c)  =  1,  then  (E§)2  =  E§2  =  c 2  and  hence 

Var(§)  =  c2  —  c2  =  0. 

If  Var(§)  =  E(£  -  E§)2  =  0  then  (since  (f  -  E§)2 
P(§  =  E§)  =  1  (see  property  E4). 

D2.  If  a  and  b  are  constants  then 

Var {a  +  /?§)  =  b2  Var(§ ) . 

This  property  follows  immediately  from  the  definition  of  Var(§ ) . 

D3.  If  random  variables  §  and  r\  are  independent  then 


=  E(£  -  E§)2  >  0.  Let 


>0)  P(£-E£=0)  =  l,or 


Var(§  +  7])  =  Var(§)  +  Var(^). 
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Indeed, 

Var(£  +  rj)  —  E(f  +  r})2  -  (Ef  +  E r})2 

=  E$2  +  2E$Et]  +  Erf  -  (E§)2  -  (E rj)2  -  2E^Erj 
=  E£2  -  (Ef  )2  +  E»?2  -  (Et])2  =  Var(£)  +  Var(>7). 

It  is  seen  from  the  computations  that  the  variance  will  be  additive  not  only  for  inde¬ 
pendent  §  and  r],  but  also  whenever 

?7  =  E^Eij. 

Example  4.5.5  Let  v  >  0  be  an  integer- valued  random  variable  independent  of  a 
sequence  {^j}  of  independent  identically  distributed  random  variables,  Ev  <  oo  and 
E =  a.  Then,  as  we  know,  E Sv  =  aEv.  What  is  the  variance  of  SV1 
By  the  total  probability  formula, 

Var(Sy)  =  E(SV  -  ESV)2  =  J]P(v  =  k)E(Sk  -  ESV )2 

=  y]  P(v  =  k)[E(Sk  -  akj2  +  (ak  -  aEv)2] 

=  ^P(v  =  k)k  Var(^i)  +  a2E(v  —  Ev)2  =  Var(fi)Ev  +  a2  Var(v). 
This  equality  is  equivalent  to  the  relation 

E (Sv  —  va)2  =  Ev  •  Var(£i). 

In  this  form,  the  relation  remains  valid  for  any  stopping  time  v  (see  Chap.  15). 
Making  use  of  it,  one  can  find  in  Example  4.4.5  the  expectation  of  the  time  r]  until 
the  end  of  the  fair  game,  when  the  initial  capitals  z\  and  zi  of  the  players  are  finite. 
Indeed,  in  that  case  a  =  0,  Var(§i)  =  1  and 

ES2  =  Var(§i)  Erj  =  zf  P2  +  Z2P\- 

We  find  from  this  that  Erj  =  ziZ2- 


4.6  The  Correlation  Coefficient  and  Other  Numerical 
Characteristics 

Two  random  variables  §  and  rj  could  be  functionally  (deterministically)  dependent: 
§  =  g(rj) ;  they  could  be  dependent,  but  not  in  a  deterministic  way;  finally,  they  could 
be  independent.  The  correlation  coefficient  of  random  variables  is  a  quantity  which 
can  be  used  to  quantify  the  degree  of  dependence  of  the  variables  on  each  other. 

All  the  random  variables  to  appear  in  the  present  section  are  assumed  to  have 
finite  non-zero  variances. 

A  random  variable  §  is  said  to  be  standardised  if  E§  =0  and  Var(§ )  =  1 .  Any 
random  variable  §  can  be  reduced  by  a  linear  transformation  to  a  standardised  one 
by  putting  :=  (§  —  E£)/VVar(§).  Let  §  and  r)  be  two  random  variables  and 
and  rj i  the  respective  standardised  random  variables. 
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Definition  4.6.1  The  correlation  coefficient  of  the  random  variables  §  and  77  is  the 
quantity  p  (§,  77)  =  E^i- 

Properties  of  the  correlation  coefficient. 

1.  I p(£,  *7)1  <  1. 


Proof  Indeed, 

0  <  Var(§!  ±  rn)  =  E(ti  ±  >7i)2  =  2  ±  2p(§,  ij). 

It  follows  that  |p(§,  77)  |  <  1.  □ 


2.  If  %  and  r\  are  independent  then  /)(§,  77)  =  0. 

This  follows  from  the  fact  that  §1  and  771  are  also  independent  in  this  case.  □ 


The  converse  assertion  is  not  true,  of  course.  In  Sect.  4.3  we  gave  an  example  of 
dependent  random  variables  §  and  77  such  that  E§  =  0,  E77  =  0  and  E§  77  =  0.  The 
correlation  coefficient  of  these  variables  is  equal  to  0,  yet  they  are  dependent.  How¬ 
ever,  as  we  will  see  in  Chap.  7,  for  a  normally  distributed  vector  (§,  77)  the  equality 
p(§,  77)  =  0  is  necessary  and  sufficient  for  the  independence  of  its  components. 

Another  example  where  the  non-correlation  of  random  variables  implies  their 
independence  is  given  by  the  Bernoulli  scheme.  Let  P(§  =  1)  =  p ,  P(§  =0)  = 
1  —  p,  P(77  =  1)  =  q  and  P(77  =  0)  =  1  —  q.  Then 


E  $  =  p. 


E77  =  p,  Var(§)  =  p(  1  -  /?),  Var(77)  =  g(l  -  #), 


/>(£,  *7)  = 


E(g  -  p)(rj-q) 

y/pqi 1  -p)(i  -<?) 


The  equality  p(§,  77)  =  0  means  that  E§77  =  pq ,  or,  which  is  the  same, 


P($  =  M  =  1)  =  P($  =  1)P(77  =  1), 

P($  =  1 , 77  =  0)  =  P(£  =  1)  -  P($  =  1 , 77  =  1)  =  p  -  pq  =  P(§  =  1)P(77  =  0) , 
and  so  on. 

One  can  easily  obtain  from  this  that,  in  the  general  case,  §  and  77  are  independent 
if 

/>(/($),  «00)=0 

for  any  bounded  measurable  functions  /  and  g.  It  suffices  to  take  /  =  /(-oo,*), 
g  =  I(-oo,y),  then  derive  that  P(§  <  v,  77  <  y)  =  P(§  <  jv)P(t7  <  y),  and  make  use 
of  the  results  of  the  previous  chapter. 

3.  |p(§,  77)1  =  1  if  and  only  if  there  exist  numbers  a  and  b  7^  0  swc/z  that  P(t7  = 
a  +  Z?§)  =  1. 


Proof  Let  P(77  =  a  +  /?§)  =  1.  Set  E§  =  a  and  VVar(§ )  =  a ;  then 

$  —  a  a  b£  —  a  —  ba 

P(£,  *7)  =  E- - — - =  sign 6. 

a  \b\cr 
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Assume  now  that  |p(§,  rj)\  =  1.  Let,  for  instance,  p(^,rj)  =  1.  Then 

Var(£i  -  r}\)  =  2(l  -  p(£,  »?))  =  0. 

By  property  D1  of  the  variance,  this  can  be  the  case  if  and  only  if 

P(§1  -Tf]l=c)  =  1. 

If  p(§,  77)  =  — 1  then  we  get  Var(§i  +  t^)  =  0,  and  hence 

P(§!  -\-  rj\  =  c)  =  1.  □ 


If  p  >  0  then  the  random  variables  §  and  rj  are  said  to  be  positively  correlated ;  if 
p  <  0  then  §  and  ij  are  said  to  be  negatively  correlated. 


Example  4.6.1  Consider  a  transmitting  device.  A  random  variable  §  denotes  the 
magnitude  of  the  transmitted  signal.  Because  of  interference,  a  receiver  gets  the 
variable  rj  =  at;  +  A  (a  is  the  amplification  coefficient,  A  is  the  noise).  Assume  that 
the  random  variables  A  and  §  are  independent.  Let  E§  =  a,  Var(§)  =  1,  E A  =  0 
and  Var(A)  =  o 2 .  Compute  the  correlation  coefficient  of  §  and  ip. 


p($,ri)  =  E[G-a) 


at;  +  A  —  aa 


a 


+  o- 


V a2  +  cr2 


If  or  is  a  large  number  compared  to  the  amplification  a,  then  p  is  close  to  0  and  rj 
essentially  does  not  depend  on  §.  If  o  is  small  compared  to  a ,  then  p  is  close  to  1, 
and  one  can  easily  reconstruct  §  from  ij. 


We  consider  some  further  numerical  characteristics  of  random  variables.  One 
often  uses  the  so-called  higher  order  moments. 

Definition  4.6.2  The  k-th  order  moment  of  a  random  variable  §  is  the  quantity  E£k. 
The  quantity  E(§  —  E£)k  is  called  the  k-th  order  central  moment ,  so  the  variance  is 
the  second  central  moment  of  £ . 

Given  a  random  vector  (£i , . . . ,  £w),  the  quantity  E^ 1  • . .  ^nn  is  called  the  mixed 
moment  of  order  k\  kn.  Similarly,  E(£i  —  E^i)^1  •  •  •  (fn  —  E^n)kn  is  said  to 

be  the  central  mixed  moment  of  the  same  order. 


For  independent  random  variables,  mixed  moments  are  evidently  equal  to  the 
products  of  the  respective  usual  moments. 


4.7  Inequalities 

4 . 7. 1  Moment  Inequalities 

Theorem  4.7.1  (Cauchy-Bunjakovsky’s  inequality)  Iff\  and  §2  are  arbitrary  ran¬ 
dom  variables ,  then 


E|$1&I  <  [Eq2Ef22]1/2. 
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This  inequality  is  also  sometimes  called  the  Schwarz  inequality. 

Proof  The  required  relation  follows  from  the  inequality  2\ab\  <  a2  +  b 2  if  one  puts 
a 2  =  ^2/E^2,  b2  =  §|/E§|  and  takes  the  expectations  of  the  both  sides.  □ 

The  Cauchy-Bunjakovsky  inequality  is  a  special  case  of  more  general  inequali¬ 
ties. 

Theorem  4.7.2  For  r  >  1,  y  +  j  =  1,  one  has  Holder's  inequality : 

Ei§i§2i<[Ei^r]1/r[Eifer]1A, 
and  Minkowski's  inequality : 

[ei$i +&r]1/r  <  [Ei?iir]1/r  +  [Ei£2r]1/r. 

Proof  Since  vr  is,  for  r  >  1,  a  convex  function  in  the  domain  v  >  0,  which  at  the 
point  v  =  1  is  equal  to  1  and  has  derivative  equal  to  r,  one  has  r(x  —  1)  <  xr  —  1 
for  all  v  >  0.  Putting  v  =  (a/b)l^r  (a  >  0,  b  >  0),  we  obtain 

aVrb\-Ur_b<t-i, 

r  r 

or,  which  is  the  same,  al^rbl^s  <a/r  +  b/r.  If  one  puts 

i§iir  ,  ifer 

a  := - ,  b  := - 

mi\r  mi\s 

and  takes  the  expectations,  one  gets  Holder’s  inequality. 

To  prove  Minkowski’s  inequality,  note  that,  by  the  inequality  |§i  +  <  i?i  i  + 

I §2 1,  one  has 

E||i  +?2|r<E||1||?1+?2r1+E|fell?i+?2r1. 

Applying  Holder’s  inequality  to  the  terms  on  the  right-hand  side,  we  obtain 

E|$i  +&f  <  {[E|fir]Vr  +  [E|§2|r]1/r}[E|ti  +§2|(r-1)j]IA. 

Since  {r  —  \)s  =  r ,  \  —  \ / s  =  1/r,  and  Minkowski’s  inequality  follows.  □ 

It  is  obvious  that,  for  r  =  s  =  2,  Holder’s  inequality  becomes  the  Schwarz  in¬ 
equality. 

Theorem  4.7.3  (Jensen’s  inequality)  If  E§  exists  and  g(v)  is  a  convex  function, 
then  g(E§)  <  Eg(£). 

Proof  If  g(x )  is  convex  then  for  any  y  there  exists  a  number  g 1  (y)  such  that,  for 
all  v, 

g(x)>g(y)  +  (x-y)gl(y). 
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Putting  x  =  § ,  y  =  E§ ,  and  taking  the  expectations  of  the  both  sides  of  this  inequal¬ 
ity,  we  obtain 


E*(0>S(E0.  □ 

The  following  corollary  is  also  often  useful. 

Corollary  4.7.1  For  any  0  <  v  <  u, 

[E|£ r]1/v  <  [E|£  (4.7.1) 

This  inequality  shows,  in  particular,  that  if  the  u- th  order  moment  exists,  then  the 
moments  of  any  order  v  <  u  also  exist. 

Inequality  (4.7.1)  follows  from  Holder’s  inequality,  if  one  puts  :=  |§|u, 
§2  :=  1,  r  :=  u/v,  or  from  Jensen’s  inequality  with  g(x)  =  \x\u^v  and  \^\v  in  place 
of  §. 


4. 7.2  Inequalities  for  Probabilities 


Theorem  4.7.4  Let  $  >  0  with  probability  1.  Then,  for  any  x  >  0, 

E(§;§>jt)  E§ 

P(£  >  Jt)  <  <  — . 


x 


x 


(4.12) 


Tjf  E§  <  oo  then  P(§  >x)  =  o(  1  /x)  as  x  — >►  oo. 


Proof  The  inequality  is  proved  by  the  following  relations: 

E§  >  E(£;  §  >  jc)  >  *E(1;  £  >  *)  =  xP(%  >  x). 

If  E§  <oo  then  E(§;  §  >  x)  ^  0  as  r  ^  oo.  This  proves  the  second  statement 
of  the  theorem.  □ 


If  a  function  g(v)  >  0  is  monotonically  increasing,  then  clearly  {§  :  g(§)  > 
g(e)}  =  {§  :  §  >  £}  and,  applying  Theorem  4.7.4  to  the  random  variable  rj  =  g(§), 
one  gets 


Corollary  4.7.2  If  g(x)  f,  g(x)  >  0,  then 

E(g(£);£>Jt)  E  g($) 

P(£  x  <  V5VW’S-  7 

g(x)  g(x) 

In  particular,  for  g(x)  =  e,  x , 

P(?>-r)<e_AxEeA|,  A  >  0. 


Corollary  4.7.3  (Chebyshev’s  inequality)  For  an  arbitrary  random  variable  §  with 
a  finite  variance , 


P(lf  —  E£  |  >  x)  < 


Var  (|) 


(4.7.3) 
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To  prove  (4.7.3),  it  suffices  to  apply  Theorem  4.7.4  to  the  random  variable  ij  = 
(£  -  E£)2  >0.  □ 

The  assertions  of  Theorem  4.7.4  and  Corollary  4.7.2  are  also  often  called  Cheby¬ 
shev’s  inequalities  (or  Chebyshev  type  inequalities ),  since  in  regard  to  their  proofs, 
they  are  unessential  generalisations  of  (4.7.3). 

Using  Chebyshev’ s  inequality,  we  can  bound  probabilities  of  various  deviations 
of  §  knowing  only  E§  and  Var(§ ) .  As  one  of  the  first  applications  of  this  inequality, 
we  will  derive  the  so-called  law  of  large  numbers  in  Chebyshev ’s  form  (the  law  of 
large  numbers  in  a  more  general  form  will  be  obtained  in  Chap.  8). 


Theorem  4.7.5  Let  §i,  §2,  •  ■  ■  be  independent  identically  distributed  random  vari¬ 
ables  with  expectation  E  i=j  =  a  and  finite  variance  cr2  and  let  Sn  =  YTj=\  §/•  Then, 
for  any  s  >  0, 


P 


0 


as  n 


oo. 


We  will  discuss  this  assertion  in  Chaps.  5,  6  and  8. 

Proof  of  Theorem  4.7.5  follows  from  Chebyshev’s  inequality,  for 


Now  we  will  give  a  computational  example  of  the  use  of  Chebyshev’s  inequality. 

Example  4.7.1  Assume  we  decided  to  measure  the  diameter  of  the  lunar  disk  us¬ 
ing  photographs  made  with  a  telescope.  Due  to  atmospheric  interference,  measure¬ 
ments  of  pictures  made  at  different  times  give  different  results.  Let  §  —  a  denote 
the  deviation  of  the  result  of  a  measurement  from  the  true  value  a ,  E§  =  a  and 
a  =  ^/Var(§ )  =  1  on  a  certain  scale.  Carry  out  a  series  of  n  independent  measure¬ 
ments  and  put  :=  ^(§1  +  •••+£«).  Then,  as  we  saw,  E U  =  a,  Var(fw)  =  o2/n. 
Since  the  variance  of  the  average  of  the  measurements  decreases  as  the  number  of 
observations  increases,  it  is  natural  to  estimate  the  quantity  a  by  t;n . 

How  many  observations  should  be  made  to  ensure  fn  —  a\  <0.1  with  a  proba¬ 
bility  greater  than  0.99?  That  is,  we  must  have  P(|f„  —  a\  <  0.1)  >  0.99,  or  P(|f„  — 
a\  >  0.1)  <  0.01.  By  Chebyshev’s  inequality,  P(|fw  —  a\  >  0.1)  <  cr2/(n  •  0.01). 
Therefore,  if  n  is  chosen  so  that  cr2/(n  •  0.01)  <  0.01  then  the  required  inequality 
will  be  satisfied.  Hence  we  get  n  >  104. 

The  above  example  illustrates  the  possibility  of  using  Chebyshev’s  inequality  to 
bound  the  probabilities  of  the  deviations  of  random  variables.  However,  this  exam¬ 
ple  is  an  even  better  illustration  of  how  crude  Chebyshev’s  inequality  is  for  practical 
purposes.  If  the  reader  returns  to  Example  4.7.1  after  meeting  with  the  central  limit 
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theorem  in  Chap.  8,  he/she  will  easily  calculate  that,  to  achieve  the  required  accu¬ 
racy,  one  actually  needs  to  conduct  not  104,  but  only  670  observations. 


4.8  Extension  of  the  Notion  of  Conditional  Expectation 

In  conclusion  to  the  present  chapter,  we  will  introduce  a  notion  which,  along  with 
those  we  have  already  discussed,  is  a  useful  and  important  tool  in  probability  theory. 
Giving  the  reader  the  option  to  skip  this  section  in  the  first  reading  of  the  book,  we 
avoid  direct  use  of  this  notion  until  Chaps.  13  and  15. 


4.8.1  Definition  of  Conditional  Expectation 


In  Sect.  4.2  we  introduced  the  notion  of  conditional  expectation  given  an  arbitrary 
event  B  with  P(Z?)  >  0  that  was  defined  by  the  equality 


where 


E(£|  B):= 


E (gcg) 

P  (B) 


E ($;£)=  [  §JP  =  E§I5, 

JB 


(4.8.1) 


Ib  =  I b(oj)  being  the  indicator  of  the  set  B.  We  have  already  seen  and  will  see  many 
times  in  what  follows  that  this  is  a  very  useful  notion.  Definition  4.8.1  introducing 
this  notion  has,  however,  the  deficiency  that  it  makes  no  sense  when  P(Z?)  =  0.  How 
could  one  overcome  this  deficiency? 

The  fact  that  the  condition  P(Z?)  >0  should  not  play  any  substantial  role  could  be 
illustrated  by  the  following  considerations.  Assume  that  §  and  ij  are  independent, 
B  =  {rj  =  x]  and  P(Z?)  >  0.  Then,  for  any  measurable  function  <p(x,  y),  one  has 
according  to  (4.8.1)  that 


A  i  E<^(§,  77)I{t7=x}  E<^(§,  v)I{^=x} 

E [<p(%,  il)  il  =  x\  =  — — - - —  =  — — - - —  =  E <p(%,  x).  (4.8.2) 

P(r]  =  x)  P(ri  =  x) 

The  last  equality  holds  because  the  random  variables  cp(^,x)  and  1^=*}  are  inde¬ 
pendent,  being  functions  of  §  and  ij  respectively,  and  consequently 


E <p(l=,  r])l{ll=x]  =  E $p(£,  x)P(r]  =  x). 


Relations  (4.8.2)  show  that  the  notion  of  conditional  expectation  could  also  retain 
its  meaning  in  the  case  when  the  probability  of  the  condition  is  0,  for  the  equality 


E [<»(£,  i))\ri=x]=  E <p($,  x) 


itself  looks  quite  natural  for  independent  §  and  r]  and  is  by  no  means  related  to  the 
assumption  that  P(ri  =  x)  >0. 
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Fig.  4.1  Conditional 
expectation  as  the  projection 
of  f  onto 


Let  21  be  a  sub-a -algebra  of  We  will  now  define  the  notion  of  the  conditional 
expectation  of  a  random  variable  §  given  21,  which  will  be  denoted  by  E(§  |2t).  First 
we  will  give  the  definition  in  the  “discrete”  case,  but  in  such  a  way  that  it  can  easily 
be  extended. 

Recall  that  we  call  discrete  the  case  when  the  a -algebra  21  is  formed  (gener¬ 
ated)  by  an  at  most  countable  sequence  of  disjoint  events  A\ ,  A2, . . . ,  U  ■  Aj  =  Q, 
P(A;)  >  0.  We  will  write  this  as  21  =  cr(Ai ,  A2, . . .),  which  means  that  the  elements 
of  21  are  all  possible  unions  of  the  sets  A\ ,  A2, . . . . 

Let  L2  be  the  collection  of  all  random  variables  (all  the  measurable  func¬ 
tions  §( co )  defined  on  {Q,  P))  for  which  E§2  <  00.  In  the  linear  space  L2  one 

can  introduce  the  inner  product  (§,  77)  =  E (§77)  (whereby  L2  becomes  a  Hilbert 
space  with  the  norm  ||§  ||  =  (E^2)1/2;  we  identify  two  random  variables  §1  and  §2  if 
|| £1  —  §2 II  =0,  see  also  Remark  6.1.1). 

Now  consider  the  linear  space  H<%  of  all  functions  of  the  form 

%(a>)  =  ^cklAk(co), 
k 

where  I Ak(co)  are  indicators  of  the  sets  A&.  The  space  H%i  is  clearly  the  space  of 
all  2t-measurable  functions,  and  one  could  think  of  it  as  the  space  spanned  by  the 
orthogonal  system  {Iaa  (&>)}  in  L2. 

We  now  turn  to  the  definition  of  conditional  expectation.  We  know  that  the  con¬ 
ventional  expectation  a  =  E§  of  §  e  L2  can  be  defined  as  the  unique  point  at  which 
the  minimum  value  of  the  function  (p{a)  =  E(§  —  a)2  is  attained  (see  Sect.  4.5).  Con¬ 
sider  now  the  problem  of  minimising  the  functional  (p(a)  =  E(§  —  a(co))2,  §  e  L2, 
over  all  2t-measurable  functions  a(co)  from  H 

Definition  4.8.1  Let  §  e  L2.  The  2t-measurable  function  a(co)  on  which  the  mini¬ 
mum  min .aeH%i  <p(& )  is  attained  is  said  to  be  the  conditional  expectation  of  ^  given 
21  and  is  denoted  by  E(£  |2t). 

Thus,  unlike  the  conventional  expectations,  the  conditional  expectation  E(£  |2t)  is 
a  random  variable.  Let  us  consider  it  in  more  detail.  It  is  evident  that  the  minimum 
of  (p{a)  is  attained  when  a(co)  is  the  projection  §  of  the  element  §  in  the  space  L2 
onto  H<2 1,  i.e.  the  element  §  e  H% 1  for  which  §  —  §  _L  (see  Fig.  4.1).  In  that  case, 
for  any  a  e  H% 
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f  —  a  e  H<&,  £-£-L  %  -  a, 

cp(a)  =  £($-?  +  ?  -a)2=  E($  -?)2  +  E(?  -  a)2, 

<P(a)  > 

and  (p(a)  =  (p(£)  if  a  =  §  a.s. 

Thus,  in  L2  the  conditional  expectation  operation  is  just  an  orthoprojector  onto 
H$i  (|  =  E(§  |2l)  is  the  projection  of  §  onto  H% t). 

Since,  for  a  discrete  cr -algebra  21,  the  element  §,  being  an  element  of  has  the 

/V  _  /V 

form  §  =  Ck\Ak » the  orthogonality  condition  §  —  §  _L  //21  (or,  which  is  the  same, 

/V 

E(§  —  §)  lAk  =  0)  determines  uniquely  the  coefficients 

E  (£;  A*) 

E(£IA*)  =  ckP(Ak ),  c*  =  "  =  E($|  Ak), 

P  (Ak) 

so  that 

E(^|2l)=?=y]E(§|AOlA,. 

k 

Thus  the  random  variable  E(§|21)  is  constant  on  A k  and,  on  these  sets,  is  equal 
to  the  average  value  of  §  on  Ak. 

If  §  and  21  are  independent  (i.e.  P(§  e  B;  Ak)  =  P(§  e  B)P(Ak))  then  clearly 
E(§ ;  Ak)  =  E§  P (Ak)  and  §  =  E§.  If  21  =  $  then  $  is  also  discrete,  §  is  constant  on 
the  sets  Ak  and  hence  §  =  §. 

Now  note  the  following  basic  properties  of  conditional  expectation  which  allow 
one  to  get  rid  of  the  two  special  assumptions  (that  §  e  L2  and  21  is  discrete),  which 
were  introduced  at  first  to  gain  a  better  understanding  of  the  nature  of  conditional 
expectation: 

(1)  §  is  21- measurable . 

(2)  For  any  event  A  e  21, 

E(?;A)  =  E(£;A). 

The  former  property  is  obvious.  The  latter  follows  from  the  fact  that  any  event 
A  g  21  can  be  represented  as  A  e  A ^ ,  and  hence 

E(?;  A)  =  £>(?;  AA)  =  £>AP(AA)  =  £E(S;  AA)  =  E(£;  A). 

k  k  k 

The  meaning  of  this  property  is  rather  clear:  averaging  the  variable  §  over  the  set  A 
gives  the  same  result  as  averaging  the  variable  §  which  has  already  been  averaged 
over  Ajk . 

Lemma  4.8.1  Properties  (1)  and  (2)  uniquely  determine  the  conditional  expecta¬ 
tion  and  are  equivalent  to  Definition  4.8.1. 

Proof  In  one  direction  the  assertion  of  the  lemma  has  already  been  proved.  Assume 
now  that  conditions  (1)  and  (2)  hold.  2l-measurability  of  §  means  that  §  is  constant 


94 


4  Numerical  Characteristics  of  Random  Variables 


on  each  set  A^.  Denote  by  Ck  the  value  of  §  on  A&.  Since  A^  g  21,  it  follows  from 
property  (2)  that 

E(§ ;  Ak)  =  c^P(A^)  =  E(§ ;  Ak), 


and  hence,  for  co  G  Ak, 


The  lemma  is  proved. 


§ =ck  = 


E (g;  A*) 
P(A*) 


□ 


Now  we  can  give  the  general  definition  of  conditional  expectation. 

Definition  4.8.2  Let  §  be  a  random  variable  on  a  probability  space  (Q,  #,  P)  and 
21  C  $  an  arbitrary  sub-a -algebra  of  The  conditional  expectation  of  ^  given  21  is 

A  _ 

a  random  variable  §  which  is  denoted  by  E(§  |2t)  and  has  the  following  two  proper¬ 
ties: 

(1)  §  is  %l-measurable. 

(2)  For  any  A  g  21,  ewe  has  E(f ;  A)  =  E(§ ;  A). 


In  this  definition,  the  random  variable  §  can  be  both  scalar  and  vector- valued. 
There  immediately  arises  the  question  of  whether  such  a  random  variable  exists 
and  is  unique.  In  the  discrete  case  we  saw  that  the  answer  to  this  question  is  positive. 
In  the  general  case,  the  following  assertion  holds  true. 

Theorem  4.8.1  If  E|£|  is  finite,  then  the  function  §  =  E(£|2l)  in  Definition  4.8.2 
always  exists  and  is  unique  up  to  its  values  on  a  set  of  probability  0. 

Proof  First  assume  that  §  is  scalar  and  §  >  0.  Then  the  set  function 

Q(A)  =  f  §dP  =  E(§;A),  A  g  21 

J  A 

will  be  a  measure  on  (£2, 21)  which  is  absolutely  continuous  with  respect  to  P,  for 
P(A)  =  0  implies  Q(A)  =  0.  Therefore,  by  the  Radon-Nykodim  theorem  (see  Ap¬ 
pendix  3),  there  exists  an  2t-measurable  function  §  =  E(§  |2t)  which  is  unique  up  to 
its  values  on  a  set  of  measure  zero  and  such  that 

Q(A)  =  [  ?JP  =  E(?;A). 

JA 

In  the  general  case  we  put  §  =  §+  —  where  :=  max(0,  §)  >  0,  := 

max(0,  — £)  >0,  $  :=  $+  —  and  are  conditional  expectations  of  §±.  This 
proves  the  existence  of  the  conditional  expectation,  since  $  satisfies  conditions  (1) 
and  (2)  of  Definition  4.8.2.  This  will  also  imply  uniqueness,  for  the  assumption 
on  non-uniqueness  of  §  would  imply  non-uniqueness  of  §+  or  The  proof  for 
vector- valued  $  reduces  to  the  one-dimensional  case,  since  the  components  of  §  will 
possess  properties  (1)  and  (2)  and,  for  the  components,  the  existence  and  uniqueness 
have  already  been  proved.  □ 
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The  essence  of  the  above  proof  is  quite  transparent:  by  condition  (2),  for  any 
A  e  21  we  are  given  the  value 

E (?;A)  =  [  fdF, 

JA 

/v 

i.e.  the  values  of  the  integrals  of  §  over  all  sets  A  e  21  are  given.  This  clearly  should 

/V 

define  an  2t-measurable  function  §  uniquely  up  to  its  values  on  a  set  of  measure 
zero. 

The  meaning  of  E(§  |2l)  remains  the  same:  roughly  speaking,  this  is  the  result  of 
averaging  of  §  over  “indivisible”  elements  of  21. 

If  21  =  $  then  evidently  $  =  $  satisfies  properties  (1)  and  (2)  and  therefore 

E($|ff)  =  $. 

Definition  4.8.3  Let  §  and  rj  be  random  variables  on  (Q,  P)  and  21  =  a{rf)  be 
the  a  -algebra  generated  by  the  random  variable  77.  Then  E(§|21)  is  also  called  the 
conditional  expectation  oft-  given  77. 

To  simplify  the  notation,  we  will  sometimes  write  E(^  1 77)  instead  of  E(§  |cr  (77)) . 
This  does  not  lead  to  confusion. 

Since  E(^  1 77)  is,  by  definition,  a  a  (77) -measurable  random  variable,  this  means 
(see  Sect.  3.5)  that  there  exists  a  measurable  function  g(x)  for  which  E(^  1 77)  = 
g(ij).  By  analogy  with  the  discrete  case,  one  can  interpret  the  quantity  g(v)  as  the 
result  of  averaging  §  over  the  set  {77  =  x}.  (Recall  that  in  the  discrete  case  g(v)  = 
m\ri=x).) 

Definition  4.8.4  If  §  =lc  is  the  indicator  of  a  set  C  e  then  E(Ic  |2l)  is  called  the 
conditional  probability  P(C|2t)  of  the  event  C  given  21.  If  21  =  <7(77),  we  speak  of 
the  conditional  probability  F(C\rj)  of  the  event  C  given  77. 


4.8.2  Properties  of  Conditional  Expectations 

1.  Conditional  expectations  have  the  properties  of  conventional  expectations,  the 
only  difference  being  that  they  hold  almost  surely  ( with  probability  1): 

(a)  E(a  +  ^|2t)  =  a  +  feE(?|2t). 

(b)  E(§!  +  §2|2l)  =  E(£i  1 21)  +  E(§2|21). 

(c)  If^\  <  §2  as.,  then  E(§i|2l)  <  E(§2|21)  a.s. 

To  prove,  for  instance,  property  (a),  one  needs  to  verify,  according  to  Defini¬ 
tion  4.8.2,  that 

(1)  a  +  Z?E(§|21)  is  an  2t-measurable  function; 

(2)  E (a  +  b$\  A)  =  E (a  +  feE(£|2l);  A)  for  any  A  e  21. 
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Here  (1)  is  evident;  (2)  follows  from  the  linearity  of  conventional  expectation  (or 
integral). 

Property  (b)  is  proved  in  the  same  way. 

To  prove  (c),  put,  for  brevity,  :=  E (§*■  |2l).  Then,  for  any  A  e  21, 


?i  <*P  =  E(?i ;  A)  =  E($i ;  A)  <  Efe;  A)  = 


(?2-?i)dP>0. 


This  implies  that  §2  —  §1  >  0  a.s. 

2.  Chebyshev  s  inequality.  If  ^  0,  x  >  0,  then  P(^  ^  x|2l)  <  E(f|2l)/x. 

This  property  follows  from  1(c),  since  P(£  >  v|2l)  =  E(%>x}|2l),  where  lA  is 
the  indicator  of  the  event  A,  and  one  has  the  inequality  I{^>x}  <  §/*. 

3.  If  %l  and  <7(77)  are  independent,  then  E(§|21)  =  E§.  Since  §  =  E§  is  an  21- 
measurable  function,  it  remains  to  verify  the  second  condition  from  Definition  4.8.2: 
for  any  A  e  2t, 


E(| ;  A)  =  E(§ ;  A). 


This  equality  follows  from  the  independence  of  the  random  variables  lA  and  §  and 
the  relations  E(§ ;  A)  =  E(§Ia)  =  E^ElA  =  E(§ ;  A). 

It  follows,  in  particular,  that  if  §  and  r\  are  independent,  then  E(£  \rj)  =  E§.  If  the 
a -algebra  21  is  trivial,  then  clearly  one  also  has  E(§  |2l)  =  E§. 

4.  Convergence  theorems  that  are  true  for  conventional  expectations  hold  for 
conditional  expectations  as  well.  For  instance,  the  following  assertion  is  true. 


Theorem  4.8.2  (Monotone  convergence  theorem)  If  0  <Sn  t  £  <>.s.  then 

E(£w|2l)  t  E(§|21)  a.s. 


Indeed,  it  follows  from  %n+\  >  a.s.  that  §w+i  >  § n  a.s.,  where  =  E(£w|2l). 
Therefore  there  exists  an  2l-measurable  random  variable  §  such  that  \  §  a.s.  By 
the  conventional  monotone  convergence  theorem,  for  any  A  e  21, 


dP,  [  dV^  [  %dP. 

J  A  JA 

Since  the  left-hand  sides  of  these  relations  coincide,  the  same  holds  for  the  right- 
hand  sides.  This  means  that  §  =  E(§  |2t). 

5.  If  r)  is  an  %l-measurable  scalar  random  variable,  E|§|  <  00,  and  E|§ 77 1  <  00, 
then 


E(^|2t)  =  7?E(§|2l).  (4.8.3) 

//?  >  0  and  T)  >  0  then  the  moment  conditions  are  superfluous. 

In  other  words,  in  regard  to  the  conditional  expectation  operation,  2t-measurable 
random  variables  behave  as  constants  in  conventional  expectations  (cf.  prop¬ 
erty  1(a)). 

In  order  to  prove  (4.8.3),  note  that  if  77  =  lB  (the  indicator  of  a  set  B  e  21)  then 
the  assertion  holds  since,  for  any  A  e  21, 

[  E(Ifl£|2l)  dP  =  [  lB%dP=  f  ^dP=  [  E(§|21)  dP  =  f  IfiE(£|2l)dP. 
Ja  J  a  Jab  Jab  J  a 
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This  together  with  the  linearity  of  conditional  expectations  implies  that  the  assertion 
holds  for  all  simple  functions  q. 

If  §  >  0  and  r)  >  0  then,  taking  a  sequence  of  simple  functions  0  <  qn  f  r\  and 
applying  the  monotone  convergence  theorem  to  the  equality 

E(i7„§ia)  =  i?„E(§ia), 


we  obtain  (4.8.3).  Transition  to  the  case  of  arbitrary  §  and  q  is  carried  out  in  the 
standard  way — by  considering  positive  and  negative  parts  of  the  random  variables 
§  and  q.  In  addition,  to  ensure  that  the  arising  differences  and  sums  make  sense,  we 
require  the  existence  of  the  expectations  E|§  |  and  E\^q\. 

6.  All  the  basic  inequalities  for  conventional  expectations  remain  true  for  condi¬ 
tional  expectations  as  well,  in  particular,  Cauchy-Bunjakov  sky’s  inequality 

E(|£i£2||2l)  <  [E(f^|2l)E(f2  |2l)]1/2 
and  Jensen  s  inequality :  if  E|§  |  <  oo  then,  for  any  convex  function  g, 

g(E(£|2l))  < E(g(£)|2l).  (4.8.4) 

Cauchy-Bunjakovsky’s  inequality  can  be  proved  in  exactly  the  same  way  as  for 
conventional  expectations,  for  its  proof  requires  no  properties  of  expectations  other 
than  linearity. 

Jensen’s  inequality  is  a  consequence  of  the  following  relation.  By  convexity  of 
g(v),  for  any  y,  there  exists  a  number  g*(y)  such  that  g(v)  >  g(y)  +  (x  —  y)g*(y) 
(g*(y)  =  g'(y )  if  g  is  differentiable  at  the  point  y).  Put  v  =  §,  y  =  §  =  E(§  |2t),  and 
take  conditional  expectations  of  the  both  sides  of  the  inequality.  Then,  assuming  for 
the  moment  that 


E(|(£-?)s*(?)|)<oo  (4.8.5) 

(this  can  be  proved  if  E|g(§)|  <  oo),  we  get 

E[(£  -?)g*(?)|a)]  =g*(?)E(?  -?|2l)  =0 


by  virtue  of  property  5.  Thus  we  obtain  (4.8.4).  In  the  general  case  note  that  the 
function  g*(y)  is  nondecreasing.  Let  ()>-,v,  yjv)  be  the  maximal  interval  on  which 
|g*(T)l  <  X.  Put 


gN(y)  ■= 


g  00 

.  g(y±N)  ±  (y  -  y±N)N 


ify  e  [y-iv.yiv], 

ify  ^y±N- 


(y±N  can  take  infinite  values  if  ±g*(y)  are  bounded  as  y  —*■  oo.  Note  that  the  values 
of  g*(y)  are  always  bounded  from  below  as  y  — >  oo  and  from  above  as  y  ^  —  oo, 
hence  £*(y±,v)  (s  0  for  N  large  enough.)  The  support  function  g^(y)  corresponding 
to  gN(y)  has  the  form 


g*N(y)  =  max[— iV ,  min  (At ,  g*(y))] 


and,  consequently,  is  bounded  for  each  N.  Therefore,  condition  (4.8.5)  is  satisfied 
for  gtf(y)  (recall  that  E|£|  <  oo)  and  hence 

gAr(?)<E(gAf(^)|a). 
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Further,  we  have  gN(y)  t  g(y)  as  N  — >  oo  for  all  y.  Therefore  the  left-hand  side 
of  this  inequality  converges  everywhere  to  g(§)  as  N  — >  oo,  but  the  right-hand  side 
converges  to  E(g(£)|2l)  by  Theorem  4.8.2.  Property  6  is  proved.  □ 

7.  TTie  total  probability  formula 

E§  =  EE(£|2l) 

follows  immediately  from  property  2  of  Definition  4.8.2  with  A  =  C2. 


8.  Iterated  averaging  (an  extension  of  property  7):  if  21  C  21 1  then 

E($|»)  =  E[E($|»i)|2l]. 

Indeed,  for  any  A  g  21,  since  A  g  21  i  one  has 

I  E[E(^|2li)|2l]JP=  /  E(f|2li)JP=  /  /  E(£|2l)dP. 

JA  JA  JA  JA 

The  properties  1,  3-5,  7  and  8  clearly  hold  for  both  scalar-  and  vector- valued 
random  variables  § .  The  next  property  we  will  single  out. 


9.  For  §  G  Z>2,  the  minimum  of  E(§  —  a(co))2  over  all  ^-measurable  functions 
a(co)  is  attained  at  a(co)  =  E(§  |2t). 

Indeed,  E(§  —  a(co))2  =  EE((§  —  a(co))2  |2t),  but  a(&>)  behaves  as  a  constant  in 
what  concerns  the  operation  E(-|2t)  (see  property  5),  so  that 

E((£  -«(®))2|2l)  =  E(($  -E(?|2l))2|2l)  +  (E(f  |2l)  -aM)2 

and  the  minimum  of  this  expression  is  attained  at  a(co)  =  E(§  |2l). 

This  property  proves  the  equivalence  of  Definitions  4.8.1  and  4.8.2  in  the  case 
when  §  G  L2  (in  both  definitions,  conditional  expectation  is  defined  up  to  its  values 
on  a  set  of  measure  0).  In  this  connection  note  once  again  that,  in  L2,  the  operation 
of  taking  conditional  expectations  is  the  projection  onto  H %  (see  our  comments  to 
Definition  4.8.1). 

Property  9  can  be  extended  to  the  multivariate  case  in  the  following  form:  for  any 
nonnegative  definite  matrix  V,  the  minimum  min(§  —  a(co))V (§  —  a(co))T  over  all 
^[-measurable  functions  a  (00)  is  attained  at  a  (co)  =  E(§  |2l). 

The  assertions  proved  above  in  the  case  where  §  g  L2  and  the  a -algebra  21  is 
countably  generated  will  surely  hold  true  for  an  arbitrary  o -algebra  21,  but  the  sub¬ 
stantiation  of  this  fact  requires  additional  work. 

In  conclusion  we  note  that  property  5  admits,  under  wide  assumptions,  the  fol¬ 
lowing  generalisation: 

5A.  If  r)  is  21- measurable  and  g(co,  q)  is  a  measurable  function  of  its  arguments 
co  G  Q  and  r\  G  M,k  such  that  E| g(co,  77)  |2t) |  <  00,  then 

E(g(co,r1)\%)=E(g(co,y)\V{)\y=ii. 

This  implies  the  double  expectation  (or  total  probability)  formula. 

Eg(w,  rj)  =  E[E(g(<w,  y) |2t)  | y=r)]. 


(4.8.6) 
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which  can  be  considered  as  an  extension  of  Fubini’s  theorem  (see  Sects.  4.6  and  3.6). 
Indeed,  if  g(co,  y )  is  independent  of  21,  then 


E (g(co,  y )  1 21)  =  Eg (cd,  y),  E(g(co,  rj)  |St)  =  Eg(co,  y) 


Eg(ft),  rj)  —  E[Eg(ft),  y) 


y=n 


]• 


In  regard  to  its  form,  this  is  Fubini’s  theorem,  but  here  rj  is  a  vector- valued  ran¬ 
dom  variable,  while  co  can  be  of  an  arbitrary  nature. 

We  will  prove  property  5A  under  the  simplifying  assumption  that  there  exists 
a  sequence  of  simple  functions  rjn  such  that  g(co,rjn)  t  g(o),rj)  and  h(co,rjn)  t 
h(co,rj)  a.s.,  where  h(co,y)  =  E (g(co,  y)|2l)).  Indeed,  let  r\n  =  yk  for  co  G  Ak  C  21. 
Then 


g(ft),  rjn)  =  g(co,  yk)lAk  • 

By  property  5  it  follows  that  (4.8.6)  holds  for  the  functions  rjn.  It  remains  to 
make  use  of  the  monotone  convergence  theorem  (property  4)  in  the  equality 
E(g(ft),  rjn)  1 21))  =  h(co,  T]n). 


4.9  Conditional  Distributions 

Along  with  conditional  expectations,  one  can  consider  conditional  distributions 
given  sub-a -algebras  and  random  variables.  In  the  present  section,  we  turn  our  at¬ 
tention  to  the  latter. 

Let  §  and  rj  be  two  random  variables  on  (Q ,  P)  taking  values  in  W  and  Rk, 
respectively,  and  let  W  be  the  o -algebra  of  Borel  sets  in  W . 

Definition  4.9.1  A  function  F(Z?|y)  of  two  variables  y  e  M,k  and  B  e  23^  is  called 
the  conditional  distribution  of  ^  given  rj  =  y  if: 

1.  For  any  B ,  F (^ 1 77)  is  the  conditional  probability  P(§  e  B\rj)  of  the  event 
{§  G  B}  given  77,  i.e.  F(Z?|y)  is  a  Borel  function  of  y  such  that,  for  any  A  e  ^k , 

E(F(fi|?j);  rj  e  A)  =  f  F(B\y)P(rj  e  dy )  =  P(§  e  B,  ij  e  A). 

J  A 

2.  For  any  y,  F(5|y)  is  a  probability  distribution  in  B. 

Sometimes  we  will  write  the  function  F(5|y)  in  a  more  “deciphered”  form  as 
F(£|y)=P(§  eB\rj  =  y). 

We  know  that,  for  each  B  e  ,  there  exists  a  Borel  function  gs(y)  such  that 
gB(r])  =  P(§  g  B 1 77).  Thus,  putting  P(B\y)  :=  gB(y),  we  will  satisfy  condition  1 
of  Definition  4.9.1.  Condition  2,  however,  does  not  follow  from  the  properties  of 
conditional  expectations  and  by  no  means  needs  to  hold:  indeed,  since  conditional 
probability  P(§  G  B\rj)  is  defined  for  each  B  up  to  its  values  on  a  set  NB  of  zero 
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measure  (so  that  there  exist  many  variants  of  conditional  expectation),  and  this  set 
can  be  different  for  each  B .  Therefore,  if  the  union 

N=  \J  Nb 

has  a  non-zero  probability,  it  could  turn  out  that,  for  instance,  the  equalities 

P($  eBi  U  B2\ri)  =  P($  €  Bi\rj)  +  P(£  € 

(additivity  of  probability)  for  all  disjoint  B\  and  B 2  from  93^  hold  for  no  <z>  e  N,  i.e. 
on  an  cu-set  N  of  positive  probability,  the  function  gs(y)  will  not  be  a  distribution 
as  a  function  of  B . 

However,  in  the  case  when  §  is  a  random  variable  taking  values  in  W  with  the 
a -algebra  *BS  ofBorel  sets,  one  can  always  choose  gs(h)  —  P(§  £  B\rj)  such  that 
gs(y)  will  be  a  conditional  distribution ? 

As  one  might  expect,  conditional  probabilities  possess  the  natural  property  that 
conditional  expectations  can  be  expressed  as  integrals  with  respect  to  conditional 
distributions. 


Theorem  4.9.1  For  any  measurable  function  g(x)  mapping  W  into  R  such  that 
E|g(£)|  <  00,  one  has 


E(g(0  |j?)=  f  g(x)F(dx\r1). 


(4.9.1) 


Proof  It  suffices  to  consider  the  case  g(jt)  >0.  If  g(x)  =  Ia(x)  is  the  indicator  of 
a  set  A,  then  formula  (4.9.1)  clearly  holds.  Therefore  it  holds  for  any  simple  (i.e. 
assuming  only  finitely  many  values)  function  gn(x).  It  remains  to  take  a  sequence 
gn  t  g  and  make  use  of  the  monotonicity  of  both  sides  of  (4.9.1)  and  property  4 
from  Sect.  4.8.  □ 


In  real-life  problems,  to  compute  conditional  distributions  one  can  often  use  the 
following  simple  rule  which  we  will  write  in  the  form 

e  B,r]  edy) 


P(£  eB\r/  =  y)  = 

P(r]  edy) 

Both  conditions  of  Definition  4.9.1  will  clearly  be  formally  satisfied. 

If  §  and  rj  have  a  joint  density,  this  equality  will  have  a  precise  meaning. 


(4.9.2) 


Definition  4.9.2  Assume  that,  for  each  y,  the  conditional  distribution  F(5|y)  is 
absolutely  continuous  with  respect  to  some  measure  /z  in  W : 

P(£  e  B\r)  =  y)  =  f  f(x\y)n(dx). 

Jb 

Then  the  density  f(x\y)  is  said  to  be  the  conditional  density  of  §  ( with  respect  to 
the  measure  pf  given  rj  =  y. 


2For  more  details,  see  e.g.  [12,  14,  26]. 
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In  other  words,  a  measurable  function  f(x\y)  of  two  variables  x  and  y  is  the 
conditional  density  of  §  given  rj  =  y  if: 

(1)  For  any  Borel  sets  A  C  and  B  C 


/  / 

w  VG A  J x < 


/(x|y)^(Jx)P(f?  e  c/y)  =  P(£  e  B,rj  e  A). 

jeA  JxgB 

(2)  For  any  y,  the  function  f(x |y)  is  a  probability  density. 


(4.9.3) 


It  follows  from  Theorem  4.9.1  that  if  there  exists  a  conditional  density,  then 


E 


g(x)f(x\r}){i(dx). 


If  we  additionally  assume  that  the  distribution  of  rj  has  a  density  q  (y)  with  re¬ 
spect  to  some  measure  A  in  Rk,  then  we  can  re-write  (4.9.3)  in  the  form 


/  / 

J  y  Cl  A  J  xcB 


f(x\y)q(y)ii(dx)k(dy)  =  V(£  e  B,  q  e  A). 


(4.9.4) 


Consider  now  the  direct  product  of  spaces  W  and  Rk  and  the  direct  product  of 
measures  fi  x  A  on  it  (if  C  =  B  x  A,  B  C  M\  A  C  R^  then  /x  x  A(C)  =  /x(Z?)A(A)). 
In  the  product  space,  relation  (4.9.4)  evidently  means  that  the  joint  distribution  of  § 
and  77  in  W  x  Rk  has  a  density  with  respect  tog  x  A  which  is  equal  to 


f(x,y)  =  f(*\y)q(y)- 

The  converse  assertion  is  also  true. 


Theorem  4.9.2  If  the  joint  distribution  of §  and  q  in  W  x  M,k  has  a  density  f(x,y ) 
with  respect  to  /z  x  A,  then  the  function 


f(x\y )  = 


f(x,y) 

q(y) 


where  q(y)  = 


f(x9y)fi(dx ), 


w  conditional  density  of  ^  given  q  =  y,  and  the  function  q(y)  is  the  density  of  q 
with  respect  to  the  measure  A. 


Proof  The  assertion  on  q(y)  is  obvious,  since 

[  q(y)  k(dy)  =  P(q  e  A). 

JA 

It  remains  to  observe  that  f(x\y)  =  f(x,  y)/q(y)  satisfies  all  the  conditions  from 
Definition  4.9.2  of  conditional  density  (equality  (4.9.4),  which  is  equivalent  to 
(4.9.3),  clearly  holds  here).  □ 

Theorem  4.9.2  gives  a  precise  meaning  to  (4.9.2)  when  §  and  q  have  densities. 

Example  4.9.1  Let  §1  and  §2  be  independent  random  variables,  §1  II  ^ ,  §2  €=  Hy2 . 
What  is  the  distribution  of  §1  given  §1  +  §2  =  nl  We  could  easily  compute  the  de¬ 
sired  conditional  probability  P(£i  =  k\%\  +§2  =  «),  k  <n,  without  using  Theo¬ 
rem  4.9.2,  for  +  §2  ^  IU^  and  the  probability  of  the  event  {§1  +  §2  =  is 
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positive.  Retaining  this  possibility  for  comparison,  we  will  still  make  formal  use  of 
Theorem  4.9.2.  Here  and  r\  =  +  §2  have  densities  (equal  to  the  corresponding 

probabilities)  with  respect  to  the  counting  measure,  so  that 

X^Xn~^ 

f(k,  n )  =  P&  =k,r]  =  n)  =  Pfo  =  k,b  =  n-k)  =  e~kl~k*  1  2 

k\(n  —  k)\ 

cj(n)  =  P(?7  =  n)  =  e  1  2 - . 

n\ 

Therefore  the  required  density  (probability)  is  equal  to 

f(k,n )  n\  u  „  k 

f(k\n)  =  P($i  =  *|  r,  =  n)  =  ^—2  =  — - —  pk{\  -  pf~k , 

g(n)  k\(n  —  k)\ 

where  p  =  X/(X i  +  A2).  Thus  the  conditional  distribution  of  §1  given  the  fixed  sum 
+  §2  =  ft  is  a  binomial  distribution.  In  particular,  if  §1, . . . ,  §r  are  independent, 

^ 11^,  then  the  conditional  distribution  of  given  the  fixed  sum  §1  H - f-  §r  =  n 

will  be  B^r,  which  does  not  depend  on  A. 

The  next  example  answers  the  same  question  as  in  Example  4.9.1  but  for  nor¬ 
mally  distributed  random  variables. 


Example  4.9.2  Let  Gi  be  the  two-dimensional  joint  normal  distribution  of  ran¬ 
dom  variables  §1  and  §2,  where  a  =  (a\,  <22),  at  =  E §z,  and  a2  =  ||cr/  j  ||  is  the  co- 
variance  matrix,  Oij  =  E(£z  —  <2;) (§7  —  a j),  i,  j  =  1,2.  The  determinant  of  a2  is 


cr 


=  041^22  042  =  oqi<T22(l 


where  p  is  the  correlation  coefficient  of  and  §2-  Thus,  if  |p|  ^  1  then  the  covari¬ 
ance  matrix  is  non-degenerate  and  has  the  inverse 


1_  1 

1 

1 

P 

Or22 

-Orl2 

<41 

Jo\\G22 

|or2| 

— rr  1 0 

04 1 

1  —  p1 

P 

1 

l  z 

^/&UV\2 

<J22 

Therefore  the  joint  density  of  §1  and  §2  (with  respect  to  Lebesgue  measure)  is  (see 
Sect.  3.3) 


1 


2?r  \Ah  1  *222  ( 1  P2) 


x  exp 


1 


2(1  —  P2)  |_  04! 


(x  —  a\)A 


2p(x  -  a\)(y  -  02)  (y  -  02) 


2n 


Or22 


(4.9.5) 


The  one-dimensional  densities  of  and  §2  are,  respectively, 


/(*)  = 


1 


ix-ai)2/(2an) 


qiy)  = 


1 


*j2lXO\\  a/27T  Or22 

Hence  the  conditional  density  of  given  §2  =  y  is 


e-(y-a2)2/(  2o12).  (496) 
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Fig.  4.2  Illustration  to 
Example  4.9.4.  Positions  of 
the  target’s  centre,  the  first 
aimpoint,  and  the  first  hit 


f(x\y)  = 


f(x,y) 

q(y) 


l 

,  exp- 

j2jxon(\  -  p 2) 


1 

2ai  i  ( 1 


iy  -  ai) 


which  is  the  density  of  the  normal  distribution  with  mean  a\  +  Py  ^O7  —  a2)  and 
variance  crn(l  —  p1). 

This  implies  that  f(x\y)  coincides  with  the  unconditional  density  of  f(x)  in 
the  case  p  =  0  (and  hence  and  §2  are  independent),  and  that  the  conditional 
expectation  of  §1  given  §2  is 

E(£i |?2)  =  a\  +  pV^uT<722(§2  -  a2)- 

The  straight  line  x  =  <21  +  \  I °2i{y  —  <22)  is  called  the  regression  line  of  §1 

on  §2-  It  gives  the  best  mean-square  approximation  for  §1  given  §2  =  y. 


Example  4.9.3  Consider  the  problem  of  computing  the  density  of  the  random  vari¬ 
able  §  =  (p(£,  77)  when  f  and  77  are  independent.  It  follows  from  formula  (4.9.3) 
with  A  =  Rk  that  the  density  of  the  distribution  of  §  can  be  expressed  in  terms  of 
the  conditional  density  f(x\y)  as 


fix)  = 


f(x\y)P(r]edy). 


In  our  problem,  by  f(x\y)  one  should  understand  the  density  of  the  random  variable 
<Pit,  y),  since  P(£  €  B\rj  =  y)  =  P(^(f ,  y)  e  B). 


Example  4.9.4  Target  shooting  with  adjustment.  A  gun  fires  at  a  target  of  a  known 
geometric  form.  Introduce  the  polar  system  of  coordinates,  of  which  the  origin  is 
the  position  of  the  gun.  The  distance  r  (see  Fig.  4.2)  from  the  gun  to  a  certain  point 
which  is  assumed  to  be  the  centre  of  the  target  is  precisely  known  to  the  crew  of  the 
gun,  while  the  azimuth  is  not.  However,  there  is  a  spotter  who  communicates  to  the 
crew  after  the  first  trial  shoot  what  the  azimuth  deviation  of  the  hitting  point  from 
the  centre  of  the  target  is. 

Suppose  the  scatter  of  the  shells  fired  by  the  gun  (the  deviation  (§,  77)  of  the  hit¬ 
ting  point  from  the  aimpoint)  is  described,  in  the  polar  system  of  coordinates,  by  the 
two-dimensional  normal  distribution  with  density  (4.9.5)  with  a  =  0.  In  Sect.  8.4  we 
will  see  why  the  deviation  is  normally  distributed.  Here  we  will  neglect  the  circum¬ 
stance  that  the  azimuth  deviation  §  cannot  exceed  n  while  the  distance  deviation  § 
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cannot  assume  values  in  (—00,  — r).  (The  standard  deviations  o\  and  07  are  usually 
very  small  in  comparison  with  r  and  7T,  so  this  fact  is  insignificant.)  If  the  azimuth  /3 
of  the  centre  of  the  target  were  also  exactly  known  along  with  the  distance  r,  then 
the  probability  of  hitting  the  target  would  be  equal  to 


// 


f(x,y)dx  dy , 


where  B(r,  /3)  =  {(v,  y)  :  (r  +  x,  /3  -by)  e  B }  and  the  set  B  represents  the  target. 
However,  the  azimuth  is  communicated  to  the  crew  of  the  gun  by  the  spotter  based 
on  the  result  of  the  trial  shot,  i.e.  the  spotter  reports  it  with  an  error  8  distributed 
according  to  the  normal  law  with  the  density  q(y)  (see  (4.9.6)).  What  is  the  proba¬ 
bility  of  the  event  A  that,  in  these  circumstances,  the  gun  will  hit  the  target  from  the 
second  shot?  If  8  =  z,  then  the  azimuth  is  communicated  with  the  error  z  and 


P(A|5  =  z)  = 


// 

B(r,P) 


f  (x,y  —  z)  dx  dy=:<p(z). 


Therefore, 

1  r°°  2  2 

P(A)  =  E[P(A|  5)]  =  E <p(8)  = - =  /  /(2aO<p(z) dz. 

<T2v27r  j —00 


Example  4.9.5  The  segment  [0,  1]  is  broken  “at  random”  (i.e.  with  the  uniform 
distribution  of  the  breaking  point)  into  two  parts.  Then  the  larger  part  is  also  broken 
“at  random”  into  two  parts.  What  is  the  probability  that  one  can  form  a  triangle  from 
the  three  fragments? 

The  triangle  can  be  formed  if  there  occurs  the  event  B  that  all  the  three  fragments 
have  lengths  smaller  than  1/2.  Let  00  \  and  002  be  the  distances  from  the  points  of  the 
first  and  second  breaks  to  the  origin.  Use  the  complete  probability  formula 

P(fl)  =  EP(fl|<tfi). 


Since  co\  is  distributed  uniformly  over  [0,  1],  one  only  has  to  calculate  the  con¬ 
ditional  probability  P(B\co\).  If  co\  <  1/2  then  C02  is  distributed  uniformly  over 
[co\ ,1].  One  can  construct  a  triangle  provided  that  1/2  <  C02  <  1/2  +  co\ .  Therefore 
P(B\cjO\)  =  coi /(I  —  co  1)  on  the  set  {co\  <  1/2}.  We  easily  find  from  symmetry  that, 
for  co\  >  1/2, 


1  —  00  \ 

P(B\col)  = - -. 

CO  1 


Hence 


P(5)  =  2 


dx  —  —  1  -f  2 


dx 


l  —  x 


\  —  x 


dx  =  —  1  +  2  In  2. 
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One  could  also  solve  this  problem  using  a  direct  “geometric”  method.  The  den¬ 
sity  f(x,y )  of  the  joint  distribution  of  (co\,  C02)  is 

if*  <  1/2,  y  €  [x,  1], 
if*  >  1/2,  y  g  [0,  x], 
otherwise. 

It  remains  to  compute  the  integral  of  this  function  over  the  domain  corresponding 
to  B. 

All  the  above  examples  were  on  conditional  expectations  given  random  variables 
(not  a -algebras). 

The  need  for  conditional  expectations  given  a -algebras  arises  where  it  is  diffi¬ 
cult  to  manage  working  just  with  conditional  expectations  given  random  variables. 
Assume,  for  instance,  that  a  certain  process  is  described  by  a  sequence  of  random 
variables  {^j}JL_OQ  which  are  not  independent.  Then  the  most  convenient  way  to 
describe  the  distribution  of  §1  given  the  whole  “history”  (i.e.  the  values  §0,  §-1, 
§_2, . . .)  is  to  take  the  conditional  distribution  of  §1  given  cr(£o,  §-1,  •  •  •)•  It  would 
be  difficult  to  confine  oneself  here  to  conditional  distributions  given  random  vari¬ 
ables  only.  Respective  examples  are  given  in  Chaps.  13,  15-22. 


Chapter  5 

Sequences  of  Independent  Trials 
with  Two  Outcomes 


Abstract  The  weak  and  strong  laws  of  large  numbers  are  established  for  the 
Bernoulli  scheme  in  Sect.  5.1.  Then  the  local  limit  theorem  on  approximation  of 
the  binomial  probabilities  is  proved  in  Sect.  5.2  using  Stirling’s  formula  (covering 
both  the  normal  approximation  zone  and  the  large  deviations  zone).  The  same  sec¬ 
tion  also  contains  a  refinement  of  that  result,  including  a  bound  for  the  relative  error 
of  the  approximation,  and  an  extension  of  the  local  limit  theorem  to  polynomial  dis¬ 
tributions.  This  is  followed  by  the  derivation  of  the  de  Moivre-Laplace  theorem  and 
its  refinements  in  Sect.  5.3.  In  Sect.  5.4,  the  coupling  method  is  used  to  prove  the 
Poisson  theorem  for  sums  of  non-identically  distributed  independent  random  indica¬ 
tors,  together  with  sharp  approximation  error  bounds  for  the  total  variation  distance. 
The  chapter  ends  with  derivation  of  large  deviation  inequalities  for  the  Bernoulli 
scheme  in  Sect.  5.5. 


5.1  Laws  of  Large  Numbers 

Suppose  we  have  a  sequence  of  trials  in  each  of  which  a  certain  event  A  can  oc¬ 
cur  with  probability  p  independently  of  the  outcomes  of  other  trials.  Form  a  se¬ 
quence  of  random  variables  as  follows.  Put  ^  =  1  if  the  event  A  has  occurred  in 
the  k- th  trial,  and  ^  =  0  otherwise.  Then  will  be  a  sequence  of  indepen¬ 

dent  random  variables  which  are  identically  distributed  according  to  the  Bernoulli 
law:  P(§£  =  1)  =  p,  P (&  =  0)  =  q  =  1  —  p,  E&  =  p,  Var(&)  =  pq.  The  sum 
Sn  =  +  •  •  •  +  ^  B”  is  simply  the  number  of  occurrences  of  the  event  A  in  the 

first  n  trials.  Clearly  E Sn  =  np  and  Var(Sn)  =  npq. 

The  following  assertion  is  called  the  law  of  large  numbers  for  the  Bernoulli 
scheme. 


Theorem  5.1.1  For  any  s  >  0 

Sn 


n 


P 


>  £ 


0  as  n  oo. 


This  assertion  is  a  direct  consequence  of  Theorem  4.7.5.  One  can  also  obtain  the 
following  stronger  result: 
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Theorem  5.1.2  (The  Strong  Law  of  Large  Numbers  for  the  Bernoulli  Scheme)  For 

any  s  >  0,  as  n  — >  oo, 

J  Sk 
P  sup  —  -  p 

\k>n  * 

The  interpretation  of  this  result  is  that  the  notion  of  probability  which  we  intro¬ 
duced  in  Chaps.  1  and  2  corresponds  to  the  intuitive  interpretation  of  probability 
as  the  limiting  value  of  the  relative  frequency  of  the  occurrence  of  the  event.  In¬ 
deed,  Sn/n  could  be  considered  as  the  relative  frequency  of  the  event  A  for  which 
P(A)  =  p.  It  turned  out  that,  in  a  certain  sense,  Sn/n  converges  to  p. 


Proof  of  Theorem  5.1.2  One  has 


P 


sup 

k>n 


E (Sk  ~  kpf 
fc4£4 


(5.1.1) 


Here  we  again  made  use  of  Chebyshev’s  inequality  but  this  time  for  the  fourth  mo¬ 
ments.  Expanding  we  find  that 

)4 

= J2  e  -  p )4 + 6  -  p)2 

7=1  i<j 

=  k(pq4  +  qp 4)  +  3 k(k  —  \)(pq)2  <k  -\-  k(k  —  1)  =  k2.  (5.1.2) 

Thus  the  probability  we  want  to  estimate  does  not  exceed  the  sum 


oo 

£-4J2k~2^0  asn^oo.  ^ 

k=n 

It  is  not  hard  to  see  that  we  would  not  have  found  the  required  bound  if  we  used 
Chebyshev’s  inequality  with  second  moments  in  (5.1.1). 

We  could  also  note  that  one  actually  has  much  stronger  bounds  for 
P(|  Sk  —  kp  |  >  sk)  than  those  that  we  made  use  of  above.  These  will  be  derived 
in  Sect.  5.5. 


Corollary  5.1.1  If  f(x)  is  a  continuous  function  on  [0,  1]  then ,  as  n  oo, 


(5.1.3) 


uniformly  in  p. 


5.2  The  Local  Limit  Theorem  and  Its  Refinements 


109 


Proof  For  any  s  >  0, 


E 


f(p) 


<  E 


/ 


m 


n 


-  f(p ) 


+  E 


>n 


n 


P 


f 


’n 


n 


-  f(p ) 


<  sup  f(p  +  x)  -  f(p ) 

\X\<£ 


Sn 


<  £ 


P 
n 

+  &n  (£)» 


>  £ 


where  the  quantity  8(s)  is  independent  of  p  by  virtue  of  (5.1.1),  (5.1.2),  and  since 
8n(£)  — >  0  as  n  — >►  oo.  □ 


Corollary  5.1.2  If  fix)  is  continuous  on  [0, 1],  then ,  as  n  — >  oo, 


£/ 


/w 


uniformly  in  x  e  [0,  1]. 


This  relation  is  just  a  different  form  of  (5.1.3)  since 

P  {Sn=k)  =  P\pk(\-p)n-k 

(see  Chap.  1).  This  relation  implies  the  well-known  Weierstrass  theorem  on  approxi¬ 
mation  of  continuous  functions  by  polynomials.  Moreover,  the  required  polynomials 
are  given  here  explicitly — they  are  Bernstein  polynomials. 


5.2  The  Local  Limit  Theorem  and  Its  Refinements 
5.2.1  The  Local  Limit  Theorem 

We  know  that  P (Sn  =k)  =  Q) pk qn~k ,  q  =  1  —  p .  However,  this  formula  becomes 
very  inconvenient  for  computations  with  large  n  and  k,  which  raises  the  question 
about  the  asymptotic  behaviour  of  the  probability  P (Sn  =  k)  as  n  — >  oo. 

In  the  sequel,  we  will  write  an  ~  hn for  two  number  sequences  {an}  and  {bn}  if 
an/hn  —>  1  as  n  — >  oo.  Such  sequences  {«„  }  and  {/?w}  will  be  said  to  be  equivalent. 
Set 

//(x)=xln — f  (1  —  x)\n - ,  /?*  =  —.  (5.2.1) 

1  —  p  n 

Theorem  5.2.1  As  k  — >  oo  and  n  —  k  ^  o o, 

P,5.  =  *)  =  P(|  =  P*)  ~  ; 
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Proof  We  will  make  use  of  Stirling’s  formula  according  to  which  n\  ~  \f2jxnnne 
as  ri  — >  oo.  One  has 


—n 


n 


P(Sn=k)=  ,  \pKq 


k  „n—k 


n 


n 


n 


2 nk(n  —  k)  kk(n  —  k)n  k 


pk(\  -  p) 


n—k 


1 


«J2nnp*(\  —  p*) 
x  exp- 

1 


k  n  —  k 

—k  In - {n  —  k)  In - f  k  In  p  +  (n  —  k)  In  (1 

n  n 


~  P) 


«J2nnp*(\  —  p*) 

-  p*lnp-  (l  —  p*)  ln(l  -/?)]} 

.  1  =exp \nH(p*)\. 

+J27inp*{\  —  p*)  1  ' 


exp {—n[p*  In /?*  +  (!  —  p *)  ln(l  —  /?*) 


□ 


If  p*  =k/n  is  close  to  p,  then  one  can  find  another  form  for  the  right-hand  side 
of  (5.2.2)  which  is  of  significant  interest.  Note  that  the  function  H{x)  is  analytic  on 
the  interval  (0,  1).  Since 


,  v  l  —  x  ,,  1  1 

H\x)  =  In  -  -  In - ,  H"{x)  =  -  + 

P  1  ~  P 

one  has  H(p )  =  H\p)  =  0  and,  as  p*  —  p  — >►  0, 


p  1  —  v  ’ 


(5.2.3) 


H  ^  =  +  \ _  ^  +  °(\p*  ~  ^l3) 


Therefore  if  p*  ~  p  and  n(p*  —  p )3  0  then 


P  (Sn  =  k) 


1 


+f2  Ttpq 


exp 


n 


2  pq 


(p*  -p) 


Putting 


A  = 


1 


y/npq’ 

one  obtains  the  following  assertion. 


(p{x)  = 


1 


,—x2/2 


72 


JT 


Corollary  5.2.1  If  z  =  n{p *  —  p)  =  k  —  np  =  o(n 2//3)  then 

P(Sn  =k)=  P(Sn  -  np  =  z )  ~  cp(zA)A ,  (5.2.4) 

where  (p  =  (po,\{x)  is  evidently  the  density  of  the  normal  distribution  with  parame¬ 
ters  (0,  1). 


According  to  standard  conventions,  we  will  write  a(z)  =  o(b(z))  as  z  ->  zo  if  £(z)  >  0  and 
limz^zo  =  0,  and  a(z)  =  0(b(z ))  as  z  ->  zo  if  b(z)  >  0  and  limsupz^zo  ^ 


<  oo. 
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This  formula  also  enables  one  to  estimate  the  probabilities  of  the  events  of  the 
form  {Sn  <  k}. 

If  p*  differs  substantially  from  p ,  then  one  could  estimate  the  probabilities  of 
such  events  using  the  results  of  Sect.  1.3. 


Example  5.2.1  In  a  jury  consisting  of  an  odd  number  n  =  2m  +  1  of  persons,  each 
member  makes  a  correct  decision  with  probability  p  =  0.7  independently  of  the 
other  members.  What  is  the  minimum  number  of  members  for  which  the  verdict 
rendered  by  the  majority  of  jury  members  will  be  correct  with  a  probability  of  at 
least  0.99? 

Put  §£  =  1  if  the  k-th  jury  member  made  a  correct  decision  and  ^  =  0  otherwise. 
We  are  looking  for  odd  numbers  n  for  which  P (Sn  <  m)  <  0.01.  It  is  evident  that 
such  a  trustworthy  decision  can  be  achieved  only  for  large  values  of  n.  In  that  case, 
as  we  established  in  Sect.  1.3,  the  probability  P (Sn  <  m)  is  approximately  equal  to 


(n  +  1  —  m)p 
(n  +  \)p  —  m 


P  (Sn  =m) 


P 

2p  —  l 


Y(Sn  =m). 


Using  Theorem  5.2.1  and  the  fact  that  in  our  problem 


*  ~ 
p  ~ 


1 

2’ 


m\ 


we  get 


P(V  <  m) 


1 

--ln4/?(l  -  p), 


H'  I  -  I  =  lnl 


1  ~P 
P 


P 


P 


r  2~  | 

tt! 

/  —  exP ' 

—nH\ 

r  Ttn 

\ 

HT  | 

rrl 

/ - exP ' 

—nH\ 

r  Ttn 

\ 

1 

2 

1 

2 


1 


2  n 

1 

f  2 


\ 


V2tt(1  \ 

(2p-l)^/Wnyy  ^  F  ’  ' 


(0.84) 


n/2 


On  the  right-hand  side  there  is  a  monotonically  decreasing  function  a (n).  Solving 
the  equation  a(n)  =  0.01  we  get  the  answer  n  =  33.  The  same  result  will  be  obtained 
if  one  makes  use  of  the  explicit  formulae. 


5.2.2  Refinements  of  the  Local  Theorem 


It  is  not  hard  to  bound  the  error  of  approximation  (5.2.2).  If,  in  Stirling’s  formula 
n !  =  \j2ixnnn ,  we  make  use  of  the  well-known  inequalities 


1 

1 2/2  T  1 


<  0(n)  < 


1 

I2n’ 


then  the  same  argument  will  give  the  following  refinement  of  Theorem  5.2.1. 


2See,  e.g.,  [12],  Sect.  2.9. 
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Theorem  5.2.2 

P  (Sn  =k)  = 

where 


1 


+J2iinp*{\  —  p*) 


1 


exp  +  6(k,n)},  (5.2.5) 


0(k,n)  =  0(n)  —  0(k)0(n  —  k)  < - f 


1 


1 


12k  12  (n  —  k)  12np*  (1  —  p*) 

(5.2.6) 


Relation  (5.2.4)  could  also  be  refined  as  follows. 


Theorem  5.2.3  For  all  k  such  that  \p *  —  p\  <\  min (p,  q)  one  has 

P(5'„  =  k)  =(p{zA)A{\  +s(k,n)), 

where 


1  +  s(k ,  n )  =  exp 


i} 


kl3a4 


l 


3  +  W  +  6IA 


If  I  <  1 


As  one  can  easily  see  from  the  properties  of  the  Taylor  expansion  of  the  func¬ 
tion  ex ,  the  order  of  magnitude  of  the  term  s(k,n )  in  the  above  formulae  coin¬ 
cides  with  that  of  the  argument  of  the  exponential.  Hence  it  follows  from  Theo¬ 
rem  5.2.3  that  for  z  =  k  —  np  =  o(A~4/3)  or,  which  is  the  same,  z  =  o(n 2/3),  one 
still  has  (5.2.4). 


Proof  We  will  make  use  of  Theorem  5.2.2.  In  addition  to  formulae  (5.2.3)  one  can 
write: 

(k)  _  (-!)*(& -2)!  (k  —  2)! 


= 


x 


k-i 


+ 


(1  —  x)k 


ZJ,  k>  2, 


1 


H^  =  Wp'-p)  +Ru 

where  we  can  estimate  the  residual  R\  =  Y1T=  3  H  k\P)  07*  —  P )•  Taking  into  account 
that 


H{k\p)  <(k-  2)\ 


1 


1 


pk  1  gk 


+  ^k= T  ’  fe-2’ 


and  letting  for  brevity  | p*  —  p\  =  8,  we  get  for  8  <  \  min (p,  q)  the  bounds 


oo 


i^ii<E 


(k  —  2)\  (  1 


k= 3 


k\ 


k- 1 


+ 


1 


P 


q 


.  < 

k—l  / 


11  11 

+ 


pz  1-2  q2  i  _  2 
P  q 


8(2  2 

<  -  l  —  + 


6 \p2  q2 


< 


3  (pq) 


2  * 
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From  this  it  follows  that 


rj(  *X  ( k-np )2  $i\k-np\3 

—nH[p  J  = - - - 1- 


z2a2  !?i|z|  3  a 
— —  +  — 1 -4 — 


3  a4 


2  npq 


3  (npq)- 


l#il  <  1. 

(5.2.7) 


We  now  turn  to  the  other  factors  in  equality  (5.2.5)  and  consider  the  product 
p*(  1  —  p*).  Since  —  p  <  1  —  p  —  p*  <  1  —  /?,  we  have 

P*(l  -  P*)  -  P(1  -  P)  \  =  \  (p  -  P*)(l  -  P  -  P*)|  <  | P*  -  p\max(p,q). 
This  implies  in  particular  that,  for  | p*  —  p\  <  \  min (p,  q ),  one  has 


p*(l-  p*)~  pq\<^pq,  p*(\  -  p*)  >  ^pq. 


Therefore  one  can  write  along  with  (5.2.6)  that,  for  the  values  of  k  indicated  in 
Theorem  5.2.3, 


0(k,  n) 


1 


A 


< 


6npq 


(5.2.8) 


It  remains  to  consider  the  factor  [p*(l  —  p*)]  1/2.  Since  for  | y  \  <  1/2 


ln(l  +  y) 


rl+y  j 

/  —dx 
J  l  x 


<  2|y  |, 


one  has  for  8  =  \p*  —  p\  <  (1/2)  min (p,  q)  the  relations 


In  *  ( 1  —  /?*))  =  In  pq  +  ln(  1  + 


p*( i  -  p*)  -  pq 


pq 


In  1  - 


P*8 


In  (pq)  +  In  (  1  — 

2^5 


&*8 


<  ma x(p,q); 


pq 


[/?*(l-p*)]  =  \pq  ]  1/2  exp 


pq 

-1/2 


pq 

,  |#2l  <max(p,  q), 

&2S 


(5.2.9) 


pq 


Using  representations  (5.2.7)-(5.2.9)  and  the  assertion  of  Theorem  5.2.2  com¬ 
pletes  the  proof.  □ 


One  can  see  from  the  above  estimates  that  the  bounds  for  P  in  the  statement 
of  Theorem  5.2.3  can  be  narrowed  if  we  consider  smaller  deviations  \p*  —  p\ — if 
they,  say,  do  not  exceed  the  value  a  min (p,  q)  where  a  <  1/2. 

The  relations  for  P (Sn  =k)  that  we  found  are  the  so-called  local  limit  theorems 
for  the  Bernoulli  scheme  and  their  refinements. 
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5.2.3  The  Local  Limit  Theorem  for  the  Polynomial  Distributions 


The  basic  asymptotic  formula  given  in  Theorem  5.2.1  admits  a  natural  extension 
to  the  polynomial  distribution  B",  p  =  (p\, . . . ,  pr),  when,  in  a  sequence  of  inde¬ 
pendent  trials,  in  each  of  the  trials  one  has  not  two  but  r  >2  possible  outcomes 

A\ , . . . ,  Ar  of  which  the  probabilities  are  equal  to  p\ , . . . ,  pr,  respectively.  Let 
be  the  number  of  occurrences  of  the  event  A  j  in  n  trials, 

Sn  =  (S^,...,Sf)),  k  =  (kl,...,kr),  P*=l’ 

and  put  H(x)  =  ^ jq-  In  (xi/pt),  x  =  (x\ , . . . ,  xr).  Clearly,  Sn  ^  B" .  The  following 
assertion  is  a  direct  extension  of  Theorem  5.2.1. 


Theorem  5.2.4  If  each  of  the  r  variables  k\ , . . . ,  kr  is  either  zero  or  tends  to  oo  as 
n  — >  oo  then 

/  r  \_1/2 

Y(Sn  =k)  ~  (27rn)il~ro)/2(  ]~[  p*  )  exp {-nH(p*)}, 
where  rq  is  the  number  of  variables  k\, ...  ,kr  which  are  not  equal  to  zero. 


Proof  As  in  the  proof  of  Theorem  5.2.1,  we  will  use  Stirling’s  formula 

n  \  ~  sflj rne~nnn 

as  n  — >►  oo.  Assuming  without  loss  of  generality  that  all  kj  — >►  oo,  j  =  1, . . . ,  r,  we 
get 


5.3  The  de  Moivre-Laplace  Theorem  and  Its  Refinements 

Let  a  and  b  be  two  fixed  numbers  and  =  (Sn  —  np)/^/npq.  Then 

P  (a<£n<b)=  ^2  P  (Sn-np  =  z). 

afnpq<z<bfnpq 

If,  instead  of  P (Sn  —  np  =  z),  we  substitute  here  the  values  cp(zA)A  (see  Corol¬ 
lary  5.2.1),  we  will  get  an  integral  sum  ^fa<zA<b(p(zA)A  corresponding  to  the 

integral  f'f  cp(x)  dx. 


5.3  The  de  Moivre-Laplace  Theorem  and  Its  Refinements 
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Thus  relations  (5.2.4)  make  the  equality 


r 

lim  P (a  <  <  b)  =  /  cp(x)dx  =  0(b)  —  0(a) 

n^oo  J 


(5.3.1) 


plausible,  where  0(x)  is  the  normal  distribution  function  with  parameters  (0,  1): 


0(X)  = 


This  is  the  de  Moivre-Laplace  theorem ,  which  is  one  of  the  so-called  integral  limit 
theorems  that  describe  probabilities  of  the  form  P (Sn  <  x) .  In  Chap.  8  we  will  derive 
more  general  integral  theorems  from  which  (5.3.1)  will  follow  as  a  special  case. 

Theorem  5.2.3  makes  it  possible  to  obtain  (5.3.1)  together  with  an  error  bound 
or,  in  other  words,  with  a  bound  for  the  convergence  rate. 

Let  A  and  B  be  integers, 


a  = 


A  —  np 

y/npq  ’ 


B  —  np 

yfnpq  ' 


(5.3.2) 


Theorem  5.3.1  Let  b  >  a,  c  =  max(|a|,  \b\),  and 

c3  +  3  c  A2 

p  = - A  T  — . 

H  3  6 

If  A  =  1  IJnpq  <  1/2  and  p  <  1/2  then 

nb 

P(A  <  Sn  <  B)  =F(a  <  £n  <  b)  =  /  (p(t)dt(l  +  d\Ac)(\  +  2&2p),  (5.3.3) 


where  |  <  1,  i  =  1,  2. 


This  theorem  shows  that  the  left-hand  side  in  (5.3.3)  can  be  equivalent  to  0  (b)  — 
0(a)  for  growing  a  and  b  as  well.  In  that  case,  0(b)  —  0(a)  can  converge  to  0,  and 
knowing  the  relative  error  in  (5.3.1)  is  more  convenient  since  its  smallness  enables 
one  to  establish  that  of  the  absolute  error  as  well,  but  not  vice  versa. 


Proof  First  we  note  that,  for  all  k  such  that  \z\  =  \k  —  np\  <  c^/npq,  the  con¬ 
ditions  of  Theorem  5.2.3  will  hold.  Indeed,  to  have  the  inequality  \p *  —  p\  < 
(1/2)  min (p,  q)  it  suffices  that  | k  —  np\  <  npq/2  =  1/(2 A2).  This  inequality  will 
hold  if  c  <  1/(2 A).  But  since  p  <  1/2,  one  has 


c(c2  +  3)Z\ 

3 


<1/2, 


cA  <  1/2. 


Thus,  for  each  k  such  that  a^/npq  <  z  <  b^/npq,  we  can  make  use  of  Theorem  5.2.3 
to  conclude  that 
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P (A  <  Sn  <  B) 

P  (Sn  =  k) 

a  ^Jnpq<z<b  ^Jnpq 


=  <p(zA)A 

a<zA<b 


3  a4 


1  +  I  exp | ft 


+  I  Izl  +  T  U 


- 1 


(5.3.4) 


where  \d\  <  1.  Since,  for  p  <  1 


eP  -  1 
P 


<  e  —  1  <  2, 


the  absolute  value  of  the  correction  term  in  (5.3.4)  does  not  exceed  (substituting 
there  zA  =  c) 


exp 

Therefore 


c3  A  A2 

d  ( - 1-  cA  H - 


-  1 


c3A  A2 

<  2d 1  — - 1-  cA  H — —  )  =  2dp. 


P  (A<S„<B)=  <PizA)A[  l+20ip], 

a<zA<b 


(5.3.5) 


nx-\-A 

A2 

Acp{x)  —  /  cp{t)dt 

=  —  max 

<p'(t) 

Jx 

2  x<t<x+A 

where  |#i  |  <  1. 

Now  we  transform  the  sum  on  the  right-hand  side  of  the  last  equality.  To  this  end, 
note  that,  for  any  smooth  function  (p(x). 


(5.3.6) 


But  for  the  function  cp(x)  =  (27r)_1/2£_x  /2  one  has  cp\x)  =  —xcp(x)  and  the  max¬ 
imum  value  of  49  (f)  on  the  segment  [x,x  +  A],  \x\  <  c,  differs  from  the  minimum 
value  by  not  more  than  the  factor  exp{c/A  +  A2/ 2}.  Therefore,  for  \x  \  <  c,  one  has 
by  virtue  of  (5.3.6) 

rx+A 

Acp(x)  —  /  (pit)  dt 
J x 

A2 

< 


•  x~\~  A 


- -ecA+A2/2  min  ^>(f)  <  _JiecA+A2/2  f  (pit)dt . 

2  x<t<x+A  2  Jx 

Since  cA  +  A2/ 2  <  1/2  +  1/8,  ecA+A  /2  <  2,  we  have  the  representation 


A<p(x)  = 


nx-\-A 

Jx 


cp(t)  dt  (1  +  d\Ac),  |  |  <  1 . 


Substituting  this  into  (5.3.5)  we  obtain  the  assertion  of  the  theorem. 
Thus  by  Theorem  5.3.1  the  difference 

PC*  <  Kn  <y)~  (@iy)  -  <£(-*))  | 


□ 


(5.3.7) 
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can  be  effectively,  yet  rather  roughly,  bounded  from  above  by  a  quantity  of  the  order 
1  / *Jnpq  if  x  =  a,  y  =  b  (assuming  that  a  and  b  are  values  which  can  be  represented 
in  the  form  ( k  —  np)A ,  see  (5.3.2)).  If  v  and  y  do  not  belong  to  the  mentioned 
lattice  with  the  span  A  then  the  error  (5.3.7)  will  still  be  of  the  same  order  since, 
for  instance,  when  y  varies,  P(v  <  <  y)  remains  constant  on  the  semi-intervals 

of  the  form  (< a  +  kA,a  +  (k  +  1  )A],  while  the  function  <P(y)  —  d>(x)  increases 
monotonically  with  a  bounded  derivative.  A  similar  argument  holds  for  the  left  end 
point  v.  It  is  important  to  note  that  the  error  order  1  / *Jnpq  cannot  be  improved ,  for 
the  jumps  of  the  distribution  function  of  are  just  of  this  order  of  magnitude  by 
Theorem  5.2.2. 

Theorem  5.3.1  enables  one  to  use  the  normal  approximation  for  P(x  <  <  y) 

in  the  so-called  large  deviations  range  as  well,  when  both  x  and  y  grow  in  absolute 
value  and  are  of  the  same  sign.  In  that  case,  both  &(y)  —  <P(x)  and  the  probability 
to  be  approximated  tend  to  zero.  Therefore  the  approximation  can  be  considered 
satisfactory  only  if 


PC*  <  ?n  <  y)  . 

(0(y)-0(x))^ 

As  Theorem  5.3.1  shows,  this  convergence  will  take  place  if 

c  =  max(|x|,  \y\)  =  o(z\_1/3) 


(5.3.8) 


or,  which  is  the  same,  c  =  o(n1^6).  For  more  details  about  large  deviation  probabil¬ 
ities,  see  Chap.  9. 

For  larger  values  of  c,  as  one  could  verify  using  Theorem  5.2.1,  relation  (5.3.8) 
will,  generally  speaking,  not  hold. 

In  conclusion  we  note  that  since 


P(l?nl>6)^0 

as  b  — >  oo,  it  follows  immediately  from  Theorem  5.3.1  that,  for  any  fixed  y, 

lim  P (£„  <y)  =  <P(y). 

n^o O 

Later  we  will  show  that  this  assertion  remains  true  under  much  wider  assumptions, 
when  t,n  is  a  scaled  sum  of  arbitrary  distributed  random  variables  having  finite  vari¬ 
ances. 


5.4  The  Poisson  Theorem  and  Its  Refinements 

5.4.1  Quantifying  the  Closeness  of  Poisson  Distributions  to  Those 
of  the  Sums  Sn 

As  we  saw  from  the  bounds  in  the  last  section,  the  de  Moivre-Laplace  theorem 
gives  a  good  approximation  to  the  probabilities  of  interest  if  the  number  npq  (the 
variance  of  Sn)  is  large.  This  number  will  grow  together  with  n  if  p  and  q  are  fixed 
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positive  numbers.  But  what  will  happen  in  a  problem  where,  say,  p  =  0.001  and 
n  =  1000  so  that  np  =  11  Although  n  is  large  here,  applying  the  de  Moivre-Laplace 
theorem  in  such  a  problem  would  be  meaningless.  It  turns  out  that  in  this  case  the 
distribution  P (Sn  =  k)  can  be  well  approximated  by  the  Poisson  distribution 
with  an  appropriate  parameter  value  /x  (see  Sect.  5.4.2).  Recall  that 

n^)=  E 

0  <keB 

Put  np  =  p. 


Theorem  5.4.1  For  all  sets  B , 

P (Sn  e  B) 


n  »(B) 


< 


We  could  prove  this  assertion  in  the  same  way  as  the  local  theorem,  making  use 
of  the  explicit  formula  for  P (Sn  =  k).  However,  we  can  prove  it  in  a  simpler  and 
nicer  way  which  could  be  called  the  common  probability  space  method ,  or  coupling 
method.  The  method  is  often  used  in  research  in  probability  theory  and  consists, 
in  our  case,  of  constructing  on  a  common  probability  space  random  variables  Sn 
and  S* ,  the  latter  being  as  close  to  Sn  as  possible  and  distributed  according  to  the 
Poisson  distribution. 

It  is  also  important  that  the  common  probability  space  method  admits,  without 
any  complications,  extension  to  the  case  of  non-identically  distributed  random  vari¬ 
ables,  when  the  probability  of  getting  1  in  a  particular  trial  depends  on  the  number  of 
the  trial.  Thus  we  will  now  prove  a  more  general  assertion  of  which  Theorem  5.4.1 
is  a  special  case. 

Assume  that  we  are  given  a  sequence  of  independent  random  variables  , . . . ,  §w, 
such  that  €=  B Pj.  Put,  as  above,  Sn  =  YTj=\  £/•  The  theorem  we  state  below  is 
intended  for  approximating  the  probability  P(^  =  k)  when  pj  are  small  and  the 
number  fi  =  YH)=\  Pj  is  “comparable”  with  1. 


Theorem  5.4.2  For  all  sets  B , 


P (Sn  e  B) 


n^o B) 


< 


j= i 


To  prove  this  theorem  we  will  need  an  important  “stability”  property  of  the  Pois¬ 
son  distribution. 


Lemma  5.4.1  If  rj\  and  772  are  independent ,  772 


n/Xl  and  772  ^  IIM2,  then 


M2’ 


hi  T  *72  ^  I1/X1+/X2' 


Hhis  fact  will  also  easily  follow  from  the  properties  of  characteristic  functions  dealt  with  in 
Chap.  7. 
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Proof  By  the  total  probability  formula, 


k 

P(»7i  +  m  =  k)  =  =  j)V(m  =  k-  j) 

j= o 


=£ 

7=0 


k  J e~V  1 


k-j  -112 


J- 


(ill  +  ii  2)k 

k\  ' 


□ 


Proof  of  Theorem  5.4.2  Let  co  ,  con  be  independent  random  variables,  each  be¬ 
ing  the  identity  function  (§(&>&)  =  cok)  on  the  unit  interval  with  the  uniform  dis¬ 
tribution.  We  can  assume  that  the  vector  co  =  (co\, . . . ,  con)  is  given  as  the  identity 
function  on  the  unit  n -dimensional  cube  Q  with  the  uniform  distribution. 

Now  construct  the  random  variables  §/  and  on  Q  as  follows: 

J  j 


$}{<»)  = 


0  if  oo j  <  1  —  pj , 

1  if  oo  j  >  1  —  Pj , 


$/(*>)  = 


0  if  oo j  <  e  pJ , 
k>  1  if  00 j  €  [7Tk-l,7tk), 


where  jr*  =  J2m<k  e  Pj  %!“>  &  =  0, 1 , . . . . 


It  is  evident  that  the  (co)  are  independent  and  (co)  €=  Bpj ;  §  *  (&>)  are  also 
jointly  independent  with  §*(&»)  €=  II  ^ .  Now  note  that  since  1  —  pj  <  e~pi  one  has 
(co)  §*(&>)  only  if  ooj  g  [1  —  pj ,  e~pj)  or  coj  G  [e~pj  +  pje~pf  1].  Hence 


P (£/^£/)  =  (e  Pi  ~  1  +  Pj)  +  (!  ~e  Pj-Pje  Pj)  =  pj(l-e  Pj)  <  p] 


and 


p  (Sn  /  <  r(uit,  /?;|)  <E  p2r 


where  ^  = 

Now  we  can  write 


P (Sn  e  B)  =  P(S„  e  B,  Sn  =  S*)  +  P(S„  e  B,  Sn  ±  S*) 

=  P (S*  eB)-  P(S*  e  B,  S„  ±  S*)  +  P(S„  e  B,  Sn  ±  S*), 

so  that 

P(S»  G  B)  -  P(S„*  €  B)  | 

<  |P(S*  e  B,  Sn  ±  S*)  -  P (S„  e  B,  Sn  /  S*)  \  <  P(S„  ^  S*) .  (5.4.1) 

The  assertion  of  the  theorem  follows  from  this  in  an  obvious  way.  □ 


Remark  5.4.1  One  can  give  other  common  probability  space  constructions  as  well. 
One  of  them  will  be  used  now  to  show  that  there  exists  a  better  Poisson  approxima¬ 
tion  to  the  distribution  of  Sn . 
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Namely,  let  §*(&>)  be  independent  random  variables  distributed  according  to  the 


Poisson  laws  with  parameters  r j  =  —  ln(l  —  pj)  >  pj ,  so  that  P(§*  =  0)  =  e  rJ  = 
1  —  Pj.  Then  ^j(co)  =  min{l,  £  *  (co) }  €=  Bpj  and,  moreover, 


n 


n 


n 


p  U &■(*>)  * <  Ep(^>)  ^ 2) = EO  - -  rJe~n) 

\j= 1  /  7  =  1  7  =  1 

But  for  r  =  —  ln(l  —  /?)  one  has  the  inequality 

.2 


1  —  e  —re  =  p  +  (1  —  p)  ln(l  —  p)  <  p  +  (1  —  p)  I  —  p  — 


P ‘ 


=  E2(1+P)- 

Hence  for  the  new  Poisson  approximation  we  have 


p(V/v)< '  E^d+P,) 


;'=i 


Putting  A  =  —  Y?j=\  ln(l  -  Pj)  >  y;;,,  pj ,  the  same  argument  as  above  will  lead 


to  the  bound 


1  " 

sup|P(5„  e  B)  —  IIx(-B) |  <  -  E^/^1  +  P/)- 
B  2  t=i 


This  bound  of  the  rate  of  approximation  given  by  the  Poisson  distribution  with  a 
‘‘slightly  shifted”  parameter  is  better  than  that  obtained  in  Theorem  5.4.2.  Moreover, 
one  could  note  that,  in  the  new  construction,  <  §  * ,  Sn  <  S* ,  and  consequently 

P(V  >  k)  <  p(s„*  >k)  =  n k([k,  oo)). 


5.4.2  The  Triangular  Array  Scheme.  The  Poisson  Theorem 

Now  we  will  return  back  to  the  case  of  identically  distributed  To  obtain  from 
Theorem  5.4.2  a  limit  theorem  of  the  type  similar  to  that  of  the  de  Moivre-Laplace 
theorem  (see  (5.3.1)),  one  needs  a  somewhat  different  setup.  In  fact,  to  ensure 
that  np  remains  bounded  as  n  increases,  p  =  P(§&  =  1)  needs  to  converge  to  zero 
which  cannot  be  the  case  when  we  consider  a  fixed  sequence  of  random  variables 

£i, 

We  introduce  a  sequence  of  rows  (of  growing  length)  of  random  variables: 


92  ’ 

£(3)  ,(1). 

92  ’  $1  ’ 


t(n)  *.(«)  t(n) 

S2  ’  S3  >  • •  •  ?  s w  • 


*1 

*1 


(1) 


(2) 

1 

(3) 


00 
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This  is  the  so-called  triangular  array  scheme.  The  superscript  denotes  the  row  num¬ 
ber,  while  the  subscript  denotes  the  number  of  the  variable  in  the  row. 

Assume  that  the  variables  in  the  n- th  row  are  independent  and  ^  B Pn, 

k  —  1 


Corollary  5.4.1  (The  Poisson  theorem)  Ifnpn  /x  >  0  as  n  ^  oo  then,  for  each 
fixed  k , 


v(s„  =  k)^nll({k}), 

where  Sn  =  H - f  f„(n). 


(5.4.2) 


Proof  This  assertion  is  an  immediate  corollary  of  Theorem  5.4.1.  It  can  also  be 
obtained  directly,  by  noting  that  it  follows  from  the  equality 


P  (Sn  =  k) 


that 


P(S„  =Q)  =  enm~p) 


P(Sn=k+l)  n  —  kp  /x 

P  (Sn  =  k)  k  +  1  1  —  p  k  +  1 


Theorem  5.4.2  implies  an  analogue  of  the  Poisson  theorem  in  a  more  general 
case  as  well,  when  the  § .  are  not  necessarily  identically  distributed  and  can  take 
values  different  from  0  and  1 . 


Corollary  5.4.2  Assume  that  pjn  =  P(§ 


O) 

j 


max  p jn  ->  0, 
j 


n 

n>0, 

7  =  1 


Then  (5.4.2)  holds. 


1)  depend  on  n  and  j  so  that 
P(?]n)=0)  =  l  -Pjn+oipjn). 


Proof  To  prove  the  corollary,  one  has  to  use  Theorem  5.4.2  and  the  fact  that 

P(  U  [tf  ^  0,  yp  #  l}\  <  f^oiPjn)  =  0(1), 

\j= 1  /  7  =  1 

which  means  that,  with  probability  tending  to  1,  all  the  variables  assume  the 
values  0  and  1  only.  □ 


One  can  clearly  obtain  from  Theorems  5.4.1  and  5.4.2  somewhat  stronger  asser¬ 
tions  than  the  above.  In  particular, 


sup|P(S)7  e  B) 

B 


I MB) 


as  n 


oo. 


4  An  extension  of  the  de  Moivre-Laplace  theorem  to  the  case  of  non-identically  distributed  random 
variables  is  contained  in  the  central  limit  theorem  from  Sect.  8.4. 
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Note  that  under  the  assumptions  of  Theorem  5.4.1  this  convergence  will  also 
take  place  in  the  case  where  np  — >  oo  but  only  if  np2  0.  At  the  same  time,  the 
refinement  of  the  de  Moivre-Laplace  theorem  from  Sect.  5.3  shows  that  the  normal 
approximation  for  the  distribution  of  Sn  holds  if  np  ->  oo  (for  simplicity  we  assume 
that  p  <  q  so  that  npq  >  ^ np  oo). 

Thus  there  exist  sequences  p  e  {p  :  np  ^  o o,  np2  ^  0}  such  that  both  the 
normal  and  the  Poisson  approximations  are  valid.  In  other  words,  the  domains  of 
applicability  of  the  normal  and  Poisson  approximations  overlap. 

We  see  further  from  Theorem  5.4.1  that  the  convergence  rate  in  Corollary  5.4.1 
is  determined  by  a  quantity  of  the  order  of  n~[.  Since,  as  n  — >  oo, 

P(5'„  =  0)  -  nix({0})=enln(1-P) -e^  ~  e 

this  estimate  cannot  be  substantially  improved.  However,  for  large  k  (in  the  large 
deviations  range,  say)  such  an  estimate  for  the  difference 

P(s„  =  fc)-n^({*}) 

becomes  rough.  (This  is  because,  in  (5.4.1),  we  neglected  not  only  the  different  signs 
of  the  correction  terms  but  also  the  rare  events  {Sn  =  k }  and  {£*  =  k }  that  appear  in 
the  arguments  of  the  probabilities.)  Hence  we  see,  as  in  Sect.  5.4,  the  necessity  for 
having  approximations  of  which  both  absolute  and  relative  errors  are  small. 

Now  we  will  show  that  the  asymptotic  equivalence  relations 

v(s„  =  k)~nli({k}) 

remain  valid  when  k  and  p  grow  ( along  with  n)  in  such  a  way  that 

k  =  o(n 2/3),  p  =  o(n 2/3),  \k  —  p\  =  o(^/n). 


Proof  Indeed, 


n\  kt i  n(n-l)---(n-k+l)  k 


P(s„=k)  =  [  k)PK  (i- pT-  = 


k\ 
k-  1 


PKV-p) 


n—k 


k\  \  n  J  \  n 


J(1  -p)n~kepn 


=  n  „({k})esik’n\ 

Thus  we  have  to  prove  that,  for  values  of  k  and  p  from  the  indicated  range, 


s(k ,  n)  :=  In 


i  _  IV  ..  (i  _  LJ.  )(1  _  p)n-kepn 


=  o(  1). 


(5.4.3) 


We  will  obtain  this  relation  together  with  the  form  of  the  correction  term.  Namely, 
we  will  show  that 


k  —  (k  —  p)2  (  k2  +  p2 

s(k,n)  = - ^ — —  +  Ol  1 


2  n 


(5.4.4) 


m 
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and  hence 


P  (Sn  =0=1  + 


k  —  (k  —  fji)2  _/k3  +  "3 


2  n 


+  O 


/r 


m 


({*}). 


We  make  use  of  the  fact  that,  as  a  — >  0, 


cr 


In ( 1  —  a)  =  —a - b  6>(cr ). 

2' 

Then  relations  (5.4.3)  and  (5.4.4)  will  follow  from  the  equalities 

k- 1 


E -H)~z:;+0(S 

7  =  1  7=1 


k(k  —  1)  /  k3 

+  0 


2n 


at 


(n  —  k)  ln(l  —  p)  +  pn  =  (n  —  k)  p  —  +  0(/?3)^  +  pn 


uz  kii  I  ii' 

=  -t-  +  — +  1 


2  n 


n 


n‘ 


□ 


In  conclusion  we  note  that  the  approximate  Poisson  formula 

P  (Sn=k)*^e~11 

is  widely  used  in  various  applications  and  has,  as  experience  and  the  above  estimates 
show,  a  rather  high  accuracy  even  for  moderate  values  of  n. 

Now  we  consider  several  examples  of  the  use  of  the  de  Moivre-Laplace  and 
Poisson  theorems  for  approximate  computations. 


Example  5.4.1  Suppose  we  are  given  104  packets  of  grain.  It  is  known  that  there  are 
5000  tagged  grains  in  the  packets.  What  is  the  probability  that,  in  a  particular  fixed 
packet,  there  is  at  least  one  tagged  grain?  We  can  assume  that  the  tagged  grains  are 
distributed  to  packets  at  random.  Then  the  probability  that  a  particular  tagged  grain 
will  be  in  the  chosen  packet  is  p  =  10-4.  Since  there  are  5000  such  grains,  this 
will  be  the  number  of  trials,  i.e.  n  =  5000.  Define  a  random  variable  ^  as  follows: 
%k  =  1  if  the  k- th  grain  is  in  the  chosen  packet,  and  ^  =  0  otherwise.  Then 

5000 

5*5000  = 

k= 1 

will  be  the  number  of  tagged  grains  in  our  packet.  By  Theorem  5.4.1,  P(S,5ooo  = 
0)  ~  e~np  =  e~ °-5  so  that  the  desired  probability  is  approximately  equal  to  1  — 
^-0.5  The  accuracy  of  this  relation  turns  out  to  be  rather  high  (by  Theorem  5.4.1, 
the  error  does  not  exceed  2-1  x  10-4).  If  we  used  the  Poisson  theorem  instead  of 
Theorem  5.4.1,  we  would  have  to  imagine  a  triangular  array  of  Bernoulli  random 
variables,  our  ^  constituting  the  5000-th  row  of  the  array.  Moreover,  we  would 
assume  that,  for  the  n- th  row,  one  has  npn  =0.5.  Thus  the  conditions  of  the  Poisson 
theorem  would  be  met  and  we  could  make  use  of  the  limit  theorem  to  find  the 
approximate  equality  we  have  already  obtained. 
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Example  5.4.2  A  similar  argument  can  be  used  in  the  following  problem.  There  are 
n  dangerous  bacteria  in  a  reservoir  of  capacity  V  from  which  we  take  a  sample  of 
volume  ii«  V.  What  is  the  probability  that  we  will  find  the  bacteria  in  the  test 
sample? 

One  usually  assumes  that  the  probability  p  that  any  given  bacterium  will  be  in  the 
test  sample  is  equal  to  the  ratio  v/V.  Moreover,  it  is  also  assumed  that  the  presence 
of  a  given  bacterium  in  the  sample  does  not  depend  on  whether  the  remaining  n  —  1 
bacteria  are  in  the  test  sample  or  not.  In  other  words,  one  usually  postulates  that  the 
mechanism  of  bacterial  transfer  into  the  test  sample  is  equivalent  to  a  sequence  of  n 
independent  trials  with  “success”  probability  equal  to  p  =  v/V  in  each  trial. 

Introducing  random  variables  ^  as  above,  we  obtain  a  description  of  the  number 
of  bacteria  in  the  test  sample  by  the  sum  Sn  =  J2k=i  in  the  Bernoulli  scheme. 
If  nv  is  comparable  in  magnitude  with  V  then  by  the  Poisson  theorem  the  desired 
probability  will  be  equal  to 

PCS',,  >0)^1  -e~nv/v. 

Similar  models  are  also  used  to  describe  the  number  of  visible  stars  in  a  certain 
part  of  the  sky  far  away  from  the  Milky  Way.  Namely,  it  is  assumed  that  if  there  are 
n  visible  stars  in  a  region  R  then  the  probability  that  there  are  k  visible  stars  in  a 
subregion  r  C  R  is 

where  p  is  equal  to  the  ratio  S(r)/S(R)  of  the  areas  of  the  regions  r  and  R  respec¬ 
tively. 


Example  5.4.3  Suppose  that  the  probability  that  a  newborn  baby  is  a  boy  is  constant 
and  equals  0.512  (see  Sect.  3.4.1). 

Consider  a  group  of  104  newborn  babies  and  assume  that  it  corresponds  to  a 
series  of  104  independent  trials  of  which  the  outcomes  are  the  events  that  either  a 
boy  or  girl  is  born.  What  is  the  probability  that  the  number  of  boys  among  these 
newborn  babies  will  be  greater  than  the  number  of  girls  by  at  least  200? 

Define  random  variables  as  follows:  =  1  if  the  k- th  baby  is  a  boy  and  ^  =  0 

otherwise.  Then  Sn  =  z2k=i^k  is  the  number  of  boys  in  the  group.  The  quantity 
npq  ~  2.5  x  103  is  rather  large  here,  hence  applying  the  integral  limit  (de  Moivre- 
Laplace)  theorem  we  obtain  for  the  desired  probability  the  value 


.  Sn  —  np  5100- 5120  \ 
P(S„  >  5100)  =  1  -  P(  < _ _  ) 


1  - 


Jnpq  V2500 
0 (-20/50)  =  1  -  <Z>(-0.4)  «  0.66. 


To  find  the  numerical  values  of  <P(x)  one  usually  makes  use  of  suitable  statistical 
computer  packages  or  calculators. 
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In  our  example,  A  =  1  /  +Jnpq  ^  1  /50,  and  a  satisfactory  approximation  by  the  de 
Moivre-Laplace  formula  will  certainly  be  ensured  (see  Theorem  5.3.1)forc<2.5. 

If,  however,  we  have  to  estimate  the  probability  that  the  proportion  of  boys  ex¬ 
ceeds  0.55,  we  will  be  dealing  with  large  deviation  probabilities  when  to  estimate 
P  (Sn  >  5500)  one  would  rather  use  the  approximate  relation  obtained  in  Sect.  1.3 
by  virtue  of  which  ( k  =  0A5n ,  q  =  0.488)  one  has 


P (Sn  >  5500) 


(n  +  1  —  k)q 


(n  +  1  )q  —  k 
Applying  Theorem  5.2.1  we  find  that 

0.55g  1 


P(S„  =5500). 


P (Sn  >  5500) 


1 


q  ~  0.45  V27rn0.25 


e-nH(0.55)  <  _g— 25  <  jq-11 


Thus  if  we  assume  for  a  moment  that  100  million  babies  are  born  on  this  planet 
each  year  and  group  them  into  batches  of  10  thousand,  then,  to  observe  a  group  in 
which  the  proportion  of  boys  exceeds  the  mean  value  by  just  3.8  %  we  will  have  to 
wait,  on  average,  10  million  years  (see  Example  4.1.1  in  Sect.  4.1). 

It  is  clear  that  the  normal  approximation  can  be  used  for  numerical  evaluation  of 
probabilities  for  the  problems  from  Example  5.4.3  provided  that  the  values  of  np 
are  large. 


5.5  Inequalities  for  Large  Deviation  Probabilities  in  the 
Bernoulli  Scheme 

In  conclusion  of  the  present  chapter  we  will  derive  several  useful  inequalities  for  the 
Bernoulli  scheme.  In  Sect.  5.2  we  introduced  the  function 

H(x)  =  x  In  — f  (1  —  x)ln - , 

p  i  -  p 

which  plays  an  important  role  in  Theorems  5.2.1  and  5.2.2  on  the  asymptotic  be¬ 
haviour  of  the  probability  P (Sn  =  k).  We  also  considered  there  the  basic  properties 
of  this  function. 

Theorem  5.5.1  For  z  >  0, 

P  (Sn  —  np>z)<  exp {—nH(p  +  z/n)}, 

P (Sn  —  np  <  —z)  <  exp{—  nH(p  —  z/n)}. 

Moreover,  for  all  p, 

H(p  +  x)  >  2x2, 

so  that  each  of  the  probabilities  in  (5.5.1)  does  not  exceed  exp{ 


(5.5.1) 

(5.5.2) 
—2z  /n}for  any  p. 
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To  compare  it  with  assertion  (5.2.2)  of  Theorem  5.2.1,  the  first  inequality  from 
Theorem  5.5.1  can  be  re-written  in  the  form 

P^—  >  <  exp{— n//(j9*)}. 

The  inequalities  (5.5.1)  are  close,  to  some  extent,  to  the  de  Moivre-Laplace  theorem 
since,  for  z  =  o(n 2/3), 

-nH(p  +  -)  =  ~f~ - f  o(l). 

The  last  assertion,  together  with  (5.5.2),  can  be  interpreted  as  follows:  deviating  by 
z  or  more  from  the  mean  value  np  has  the  maximum  probability  when  p  =  1/2. 

If  z/ 'sfn  — >  oo,  then  both  probabilities  in  (5.5.1)  converge  to  zero  as  n  — >►  oo  for 
they  correspond  to  large  deviations  of  the  sum  Sn  from  the  mean  np.  As  we  have 
already  said,  they  are  called  large  deviation  probabilities. 


Proof  of  Theorem  5.5.1  In  Corollary  4.7.2  of  the  previous  chapter  we  established 
the  inequality 

P(£>x)<e_AjrEeA?. 


Applying  it  to  the  sum  Sn  we  get 

P (Sn  >np  +  z)<  e~x{np+z)EeXSn . 

Since  EeXSn  =  Yik= l  and  the  random  variables  ex^k  are  independent, 

EeXSn  =  f\Ee^k  =  (pex  +  q)n  =  (l  +  p(ex  -  l))", 
k= l 

P (Sn  >np  +  z)<  [(1  +  p(ex  -  l))e~x(p+a^]n ,  a  =  z/n. 

The  expression  in  brackets  is  equal  to 

Ee-^fe-(P+«)]  =  pek(\-p-a)  +  (j  _  p^-Hp+ot). 

Therefore,  being  the  sum  of  two  convex  functions,  it  is  a  convex  function  of  X.  The 
equation  for  the  minimum  point  X  (a)  of  the  function  has  the  form 

—  0 1  °0(l  +  p{ek  l))  +  Pek  =  0, 

from  which  we  find  that 


X(a)  _  (P+a)q 


e  v  7  = 


(1  +  p^(«)  _  1  ))g-A(«)(p+«)  =  _JL 


p(q  -  a)’ 

p(q-a)1P+a 


q  —  a  |_  (p  +  a)q  _ 

pP+a qq-a 

(p  +  a)P+a(q  -a)<*-a 

p  +  a 


=  exp 


—  {p  +  a)\n 


P 


—  (q  —  a)  In 


q  —  a 
d 


=  exp  {-H(p  +  a)] 
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The  first  of  the  inequalities  (5.5.1)  is  proved.  The  second  inequality  follows  from 
the  first  if  we  consider  the  latter  as  the  inequality  for  the  number  of  zeros. 

It  follows  further  from  (5.2.1)  that  H(p )  =  Hfp)  =  0  and  H"(x)  =  l/v(l  —  x). 
Since  the  function  x(l  —  x)  attains  its  maximum  value  on  the  interval  [0,  1]  at  the 
point  v  =  1/2,  one  has  H"(x)  >4  and  hence 

a2 

H(p  +  a)>  —  -4  =  2a2.  □ 

2 

For  analogues  of  Theorem  5.5.1  for  sums  of  arbitrary  random  variables,  see 
Chap.  9  and  Appendix  8.  Example  9.1.2  shows  that  the  function  H(a)  is  the  so- 
called  deviation  function  for  the  Bernoulli  scheme.  This  function  is  important  in 
describing  large  deviation  probabilities. 


Chapter  6 

On  Convergence  of  Random  Variables 
and  Distributions 


Abstract  In  this  chapter,  several  different  types  of  convergence  used  in  Probability 
Theory  are  defined  and  relationships  between  them  are  elucidated.  Section  6.1  deals 
with  convergence  in  probability  and  convergence  with  probability  one  (the  almost 
sure  convergence),  presenting  some  criteria  for  them  and,  in  particular,  discussing 
the  concept  of  Cauchy  sequences  (in  probability  and  almost  surely).  Then  the  conti¬ 
nuity  theorem  is  established  (convergence  of  functions  of  random  variables)  and  the 
concept  of  uniform  integrability  is  introduced  and  discussed,  together  with  its  con¬ 
sequences  (in  particular,  for  convergence  in  mean  of  suitable  orders).  Section  6.2 
contains  an  extensive  discussion  of  weak  convergence  of  distributions.  The  chap¬ 
ter  ends  with  Sect.  6.3  presenting  criteria  for  weak  convergence  of  distributions, 
including  the  concept  of  distribution  determining  classes  of  functions  and  that  of 
tightness. 


6.1  Convergence  of  Random  Variables 

In  previous  chapters  we  have  already  encountered  several  assertions  which  dealt 
with  convergence,  in  some  sense,  of  the  distributions  of  random  variables  or  of  the 
random  variables  themselves.  Now  we  will  give  definitions  of  different  types  of 
convergence  and  elucidate  the  relationships  between  them. 


6.1.1  Types  of  Convergence 

Let  a  sequence  of  random  variables  {%n }  and  a  random  variable  §  be  given  on  a  prob¬ 
ability  space  (Q,  P). 

Definition  6.1.1  The  sequence  {§„}  converges  in  probability  to  §  if,  for  any  s  >  0, 

P(|£w  —  §  |  >  s)  —>  0  as  n  — >  oo. 

Tn  the  set-theoretic  terminology,  convergence  in  probability  means  convergence  in  measure. 
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One  writes  this  as 


as  n 


oo. 


In  this  notation,  the  assertion  of  the  law  of  large  numbers  for  the  Bernoulli 
scheme  could  be  written  as 


n 


P, 


since  Sn/n  can  be  considered  as  a  sequence  of  random  variables  given  on  a  common 
probability  space. 


Definition  6.1.2  We  will  say  that  the  sequence  converges  to  §  with  probability  1 

d.S. 

(or  almost  surely :  >  §  a.s.,  §),  if  %n(co)  — >  £(<w)  as  n  — >  oo  for  all  e  X2 

except  for  &>  from  a  set  N  C  £2  of  null  probability:  P(N)  =  0.  This  convergence  can 
also  be  called  convergence  almost  everywhere  (a.e.)  with  respect  to  the  measure  P. 

Convergence  §  implies  convergence  §„  § .  Indeed,  if  we  assume  that 

the  convergence  in  probability  does  not  take  place  then  there  exist  e  >  0,  8  >  0, 
and  a  sequence  nk  such  that,  for  the  sequence  of  events  Ak  =  {\%nk  ~  §1  > 
we  have  P(A^)  >  8  for  all  k.  Let  B  consist  of  all  elementary  events  belonging  to 
infinitely  many  Ak,  i.e.  B  =  P|^=1  (J (£Lm  Ak.  Then,  clearly  for  co  e  B,  the  con- 
vergence  $„(co)  —  $(co)  is  impossible.  But  B  =  Hm=i  where  Bm  =  (J*>m  Ak 
are  decreasing  events  (Bjfi- j_i  CZ  B^ ),  ^  P  (Anm)  —  8  and,  by  the  continuity 

axiom,  P (Bm)  ->  P (B)  as  m  — >  oo.  Therefore  P(5)  >  S  and  a.s.  convergence  is 
impossible.  The  obtained  contradiction  proves  the  desired  statement.  □ 


The  converse  assertion,  that  convergence  in  probability  implies  a.s.  convergence, 
is,  generally  speaking,  not  true,  as  we  will  see  below.  However  in  one  important 
special  case  such  a  converse  holds  true. 


Theorem  6.1.1  If  is  monotonically  increasing  or  decreasing  then  convergence 
—>  §  implies  that  § . 

p 

Proof  Assume,  without  loss  of  generality,  that  §  =  0,  >  0,  |  and  >  £.  If 

d.S. 

convergence  f  did  not  hold,  there  would  exist  an  s  >  0  and  a  set  A  with 

P  (A)  >  S  >  0  such  that  sup^>w  ^k  ^  £  for  co  g  A  and  all  n.  But  sup^>w  ^k  —  ^ n  and 
hence  we  have 


P ($„  >  e)  >  P(A)  >  5  >  0 

p 

for  all  n ,  which  contradicts  the  assumed  convergence  >  0.  □ 

Thus  convergence  in  probability  is  determined  by  the  behaviour  of  the  numerical 
sequence  P(|£w  —  §|  >  s).  Is  it  possible  to  characterise  convergence  with  probabil¬ 
ity  1  in  a  similar  way?  Set  :=  sup^>n  |£„  —  £|. 
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Corollary  6.1.1  §  if  and  only  if  —>  0,  or, ;  which  is  the  same ,  when,  for 

any  s  >  0, 

p( sup  ~  ?l  >  s  j  —>  0  as  n  — >  oo.  (6.1.1) 

^  k>n 


Proof  Clearly  >  §  a.s.  if  and  only  if  — >►  0  a.s.  But  the  sequence  decreases 
monotonically  and  it  remains  to  make  use  of  Theorem  6.1.1,  which  implies  that 

-4-  0  if  and  only  if  -^4  0.  The  corollary  is  proved.  □ 


In  the  above  argument,  the  random  variables  and  §  could  be  improper,  where 
the  random  variables  and  §  are  only  defined  on  a  set  5  and  P(£)  e  (0,  1).  (These 
random  variables  can  take  infinite  values  on  Q  \  B.)  In  this  case,  all  the  considera¬ 
tions  concerning  convergence  are  carried  out  on  the  set  B  C  £2  only. 

In  the  introduced  terminology,  the  assertion  of  the  strong  law  of  large  numbers 
for  the  Bernoulli  scheme  (Theorem  5.1.2)  can  be  stated,  by  virtue  of  (6.1.1),  as 
convergence  Sn/n  — >  p  with  probability  1. 

We  have  already  noted  that  convergence  almost  surely  implies  convergence  in 
probability.  Now  we  will  give  an  example  showing  that  the  converse  assertion  is, 
generally  speaking,  not  true.  Let  (f2,  #,  P)  be  the  unit  circle  with  the  o -algebra  of 
Borel  sets  and  uniform  distribution.  Put  §  ( co )  =  1,  §„  ( co )  =  2  on  the  arc  [r  (n),  r(n)  + 

1  /n\  and  ^n(co)  =  1  outside  the  arc.  Here  r(n)  =  Jfk=i  p  It  is  obvious  that  -4  §. 
At  the  same  time,  r(n)  ->  oo  as  n  — >►  oo,  and  the  set  on  which  converges  to  §  is 
empty  (we  can  find  no  co  for  which  ^n(co)  — >  §  (<z>)). 

However,  if  P(|§„  —  §|  >  e)  decreases  as  n  — >  oo  sufficiently  fast,  then  conver¬ 
gence  in  probability  will  also  become  a.s.  convergence.  In  particular,  relation  (6.1.1) 
gives  the  following  sufficient  condition  for  convergence  with  probability  1 . 

Theorem  6.1.2  If  the  series  E*°°=  1  P(l  &  —  §  |  >  s)  converges  for  any  s  >  0,  then 

^ n  ^  H  kl.S. 


Proof  This  assertion  is  obvious,  for 

p(U{^-^>£})^Ep(^-^>e)-  □ 

'  k>n  k=n 

It  is  this  criterion  that  has  actually  been  used  in  proving  the  strong  law  of  large 
numbers  for  the  Bernoulli  scheme. 

One  cannot  deduce  a  converse  assertion  about  the  convergence  rate  to  zero  of 
the  probability  P(|§„  —  §|  >  s)  from  the  a.s.  convergence.  The  reader  can  easily 
construct  an  example  where  — >►  §  a.s.,  while  P(\^n  —  §|  >  e)  converges  to  zero 
arbitrarily  slowly. 

Theorem  6.1.2  implies  the  following  result. 

p 

Corollary  6.1.2  If  >  §,  then  there  exists  a  subsequence  {nk}  such  that  Hnk  % 
a.s.  as  k  — >►  oo. 
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Proof  This  assertion  is  also  obvious  since  it  suffices  to  take  n k  such  that 
>  s)  <  1/k2  and  then  make  use  of  Theorem  6.1.2.  □ 


There  is  one  more  important  special  case  where  convergence  in  probability 

§  implies  convergence  —>  §  a.s.  This  is  the  case  when  the  are  sums 
of  independent  random  variables.  Namely,  the  following  assertion  is  true.  If  ^n  — 
Ylk=  l  hk>  hk  are  independent,  then  convergence  of  in  probability  implies  conver¬ 
gence  with  probability  1.  This  assertion  will  be  proved  in  Sect.  11.2. 

Finally  we  consider  a  third  type  of  convergence  of  random  variables. 


Definition  6.1.3  We  will  say  that  converges  to  §  in  the  r-th  order  mean  (in  mean 
if  r  =  1 ;  in  mean  square  if  r  =  2)  if,  as  n  —>  oo, 

Ei$„-sr->o. 

( r ) 

This  convergence  will  be  denoted  by  — >  § . 


(r)  p 

Clearly,  by  Chebyshev’s  inequality  § n  — >  §  implies  that  ->  §.  On  the  other 

(r) 

hand,  convergence  — >  does  not  follow  from  a.s.  convergence  (and  all  the  more 
from  convergence  in  probability).  Thus  convergence  in  probability  is  the  weakest  of 
the  three  types  of  convergence  we  have  introduced. 

p 

Note  that,  under  additional  conditions,  convergence  >  §  can  imply  that 


(r) 


n 


>  §  (see  Theorem  6.1.7  below).  For  example,  it  will  be  shown  in  Corol- 

p 

->  §  and  E|§n|r+<*  <  c  for  some  a  >  0,  c  <  oo  and  all  n ,  then 


lary  6.1.4  that  if  § 

(r)  , 

n  ^  H  • 


n 


Definition  6.1.4  A  sequence  is  said  to  be  a  Cauchy  sequence  in  probability  (a.s., 
in  mean)  if,  for  any  s  >  0, 

P(l£«-£J  >e)^° 

(p(  sup  I §„  -£m|  >  s)  -*  0,  E|§„  -%m\r  ->  o) 

'  k  n>m  '  ' 

as  n  —>  oo  and  m  — >►  oo. 

Theorem  6.1.3  (Cauchy  convergence  test)  §  in  one  of  the  senses  or 

(r) 

— >  if  and  only  if  §„  is  a  Cauchy  sequence  in  the  respective  sense. 

Proof  That  is  a  Cauchy  sequence  follows  from  convergence  by  virtue  of  the 
inequalities 


sup  \$n  -  <  sup  \%n  -  £|  +  |£m  -  £|  <  2  sup  \%n 

n>m  n>m  n>m 

\Hn  ~%m\r  <Cr(\%n  +  Ifm  -?T) 


for  some  Cr . 
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Now  assume  that  is  a  Cauchy  sequence  in  probability.  Choose  a  sequence  {rik} 
such  that 

P(gn  -  U  >  2~k)  <  2~k 

for  n  >  rik,  m  >  rik .  Put 

oo 

Ak  :={|£W'*+il>2-*},  v  =  J2I(Ak)- 

k=  1 

Then  P(A&)  <  2-^  and  E??  =  5Zj£ip(^£)  <  1.  This  means,  of  course,  that  the 
number  of  occurrences  of  the  events  Ak  is  a  proper  random  variable:  P (rj  <  oo)  =  1, 
and  hence  with  probability  1  finitely  many  events  Ak  occur.  This  means  that,  for  any 
co  for  which  rj(co)  <  oo,  there  exists  a  ko(co)  such  that  |£'&(<w)  —  §VnO^)l  <  2_/: 
for  all  k  >  ko(co) .  Therefore  one  has  the  inequality  | $~^k(co)  —  ^  j(co)\  ^  2  ^  for  all 

k  >  ko(&0  and  l  >  ko(co),  which  means  that  §7'(<w)  is  a  numerical  Cauchy  sequence 
and  hence  there  exists  a  value  §  (&>)  such  that  |§'fc(<w)  —  §  (co)  |  — >►  0  as  k  — >  oo.  This 
means,  in  turn,  that  §  and  hence 

P(lf»  -  $  I  >  d  <  p(l£»  -u  I  >  0  +  -  ?  I  >  0  0 

as  n  — >  oo  and  k  ^  oo. 

Now  assume  that  is  a  Cauchy  sequence  in  mean.  Then,  by  Chebyshev’s  in¬ 
equality,  it  will  be  a  Cauchy  sequence  in  probability  and  hence,  by  Corollary  6.1.2, 

Cl  S 

there  will  exist  a  random  variable  §  and  a  subsequence  {rik}  such  that  %nk  §• 
Now  we  will  show  that  E\^n  —  §|r  — >►  0.  For  a  given  e  >  0,  choose  an  n  such  that 
E\^k  ~  §z  |r  <  £  for  k  >  n  and  /  >  n.  Then,  by  Fatou’s  lemma  (see  Appendix  3), 


E|^-§r=E  lim  |$n-£ 


nk^oo 


nk 


=  E  lim  inf  \%n  ~  %nk\r  <  liminfE|^  -  %nk\r  <  £■ 


nk^oo 


nk^o o 


This  means  that  E|£w  —  §  |r  — >  0  as  ft  — >  oo. 

It  remains  to  verify  the  assertion  of  the  theorem  related  to  a.s.  convergence.  We 
already  know  that  if  is  a  Cauchy  sequence  in  probability  (or  a.s.)  then  there  exist  a 

a  $ 

§  and  a  subsequence  %nic  such  that  %nk  —A  §.  Therefore,  if  we  put  rik{n)  '=  min  [nk  : 
nk  >  ft},  then 


p(sup|^  -§|  >s)  <p( sup  |&  -  § 

' k>n  '  ^ k>n 


nk(n)  I  ^  £/2)  +P(l?^(n)  “  £1  -  £/2)  0 


as  ft  — >  oo.  The  theorem  is  proved. 


□ 


Remark  6.1.1  If  we  introduce  the  space  Lr  of  all  random  variables  §  on  {£2,$,  P) 
for  which  E|§|r  <  oo  and  the  norm  ||§||  =  (E|£|r)1/r  on  it  (the  triangle  inequal- 
ity  life  +  fell  <  ||§i  ||  +  ||§2||  is  then  nothing  else  but  Minkowski’s  inequality,  see 

(r) 

Theorem  4.7.2),  then  the  assertion  of  Theorem  6.1.3  on  convergence  — >  (which 
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is  convergence  in  the  norm  of  Lr,  for  we  identify  random  variables  §1  and  §2  if 
|| £1  —  §2 II  =0)  means  that  Lr  is  complete  and  hence  is  a  Banach  space. 

The  space  of  all  random  variables  on  (£?,  P)  can  be  metrised  so  that  conver¬ 
gence  in  the  metric  will  be  equivalent  to  convergence  in  probability.  For  instance, 
one  could  put 


piH !,&)  :=E 


|gi-&l 
l  +  l?l-?2f 


Since 


\x 

+  y\  ^  1 

\x\ 

1  +  1 

1*  +  y\  l  +  l 

\x\ 

+ 


jyj 

i  +  |y| 


always  holds,  p{f  1,  §2)  satisfies  all  the  axioms  of  a  metric.  It  is  not  difficult  to  see 
that  relations  p(f  1,^2)  ->  0  and  —>  0  are  equivalent.  The  assertion  of  Theo- 

p 

rem  6.1.3  related  to  convergence  ->  means  that  the  metric  space  we  introduced 
is  complete. 


6.1.2  The  Continuity  Theorem 

Now  we  will  derive  the  following  “continuity  theorem”. 

Theorem  6.1.4  Let  § n  §  (fn  —>  §)  and  H(s )  be  a  function  continuous  every¬ 

where  with  respect  to  the  distribution  of  the  random  variable  §  ( i.e .  H(s)  is  contin¬ 
uous  at  each  point  of  a  set  S  such  that  P(§  e  S )  =  1).  Then 

H($n)^+HG)  (H(Z„)4>H($)). 


d  s 

Proof  Let  § n  — >  £.  Since  the  sets  A  =  {co  :  ^n(co)  — >  § (<+)}  and  B  =  {co  :  § (+>)  e  S} 
are  both  of  probability  1,  P(AZ?)  =  P(A)  +  P(Z?)  —  P(A  U  B)  =  1.  But  one  has 
H{fn)  — >  #(§)  on  the  set  A 5.  Convergence  with  probability  1  is  proved. 

Now  let  £„  §.  If  we  assume  that  convergence  H(fn)  — >  //(§)  does  not  take 

place  then  there  will  exist  e  >  0,  5  >  0  and  a  subsequence  {V}  such  that 


But  §„/  —>  §  and  hence  there  exists  a  subsequence  {ft"}  such  that  §„//  §  and 

— +  #(§)•  This  contradicts  the  assumption  we  made,  for  the  latter  implies 

that 


The  theorem  is  proved. 


P(| 


□ 


6.1.3  Uniform  Integrability  and  Its  Consequences 

Now  we  will  consider  this  question:  in  what  cases  does  convergence  in  probability 
imply  convergence  in  mean? 
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The  main  condition  that  ensures  the  transition  from  convergence  in  probability 
to  convergence  in  mean  is  associated  with  the  notion  of  uniform  integrability. 

Definition  6.1.5  A  sequence  {§„}  is  said  to  be  uniformly  integrable  if 

supE(|£„|;  |§„| >  N)  ->•  0  as  N  -*  oo. 

n 

A  sequence  of  independent  identically  distributed  random  variables  with  finite 
mean  is,  clearly,  uniformly  integrable. 

If  {§„}  is  uniformly  integrable  then  so  are  {c%n}  and  {§„  +  c},  where  c  =  const. 
Let  us  present  some  further,  less  evident,  properties  of  uniform  integrability. 

Ul.  If  the  sequences  {§7'}  and  {§"}  are  uniformly  integrable  then  the  sequences 
defined  by  £n  =  max(|^  | ,  |§"|)  and  =  §77  +  are  also  uniformly  integrable. 


Proof  Indeed,  for  =  max(|§77 1,  fif\)  we  have 


E (?„;  Kn  >  N)  =E(f„;  >  N,  \Cn\  >  |§"|)  +E(?„;  >  JV,  |£'|  <  |§"|) 

<E(|^|;|^|>V)+E(|^|;  |^"|  >  A^)  ^  0 

as  N  — >  oo. 

Since 


< 

—  Sw  I  Sr 


<2max(|^|,  |£"|), 


from  the  above  it  follows  that  the  sequence  defined  by  the  sum  =  §77  +  is  also 
uniformly  integrable.  □ 


U2.  //‘{^,7}  A  uniformly  integrable  then  sup77  E|£w|  <  c  <  oo. 

Proof  Indeed,  choose  N  so  that 

supE(|£„|;  |£„|  >  N)  <  1. 

n 

Then 

supE|£„|  =sup[E(|£„|;  |£„|  <  -V)  +E(||„|;  |£„|  >  W)]  <  V  +  1.  □ 

n  n 

The  converse  assertion  is  not  true.  For  example,  for  a  sequence 

:P^n=n)  =  l/n  =  l-P^n=0) 

one  has  E|£n  I  =  1,  but  the  sequence  is  not  uniformly  integrable. 

If  we  somewhat  strengthen  the  above  statement  U2,  it  becomes  “characteristic” 
for  uniform  integrability. 


Theorem  6.1.5  For  a  sequence  {§ n }  to  be  uniformly  integrable ,  it  is  necessary  and 
sufficient  that  there  exists  a  function  fifix)  such  that 


fi(x) 

- t  oo  as  x  t  oo, 


supEi/f  (|f„|)  <  c  <  oo. 

n 


(6.1.2) 


In  the  necessity  assertion  one  can  choose  a  convex  function  xfi. 
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Proof  Without  loss  of  generality  we  can  assume  that  ft  >0. 

The  sufficiency  is  evident,  since,  putting  v(x )  :=  we  get 

JC 

E (ft,;  ft,  >N)<  -3— E(ft,i -(ft,);  ft,  >  JV)  <  -C-. 

u(fv)  u(A0 

To  prove  the  necessity ,  put 

eW  :=supE(^;  >  AO- 

n 

Then,  by  virtue  of  uniform  integrability,  e(N)  |  0  as  N  t  c>o.  Choose  a  sequence 
Nk  t  00  as  k  t  oo  such  that 

oo 

<  C\  <  oo, 

fc=l 

and  put 

g(x)  =x(s(Nk))~l/2  for  x  e  [Aft,  Aft+ 1). 

Since 

g(^~  0)  =  (e(Aft_i))_1/2  <  (e(Aft))”1/2  = 
we  have  t  oo  as  x  — >  oo.  Further, 

A 


Eg  (ft,)  =  y]E[g(ft,);  ft,  €  [Aft,  Aft+i)] 

k 

=  y]E[ft,(e(iVO)_1/2;  ft,  e  [Aft,  Aft+i)] 

k 

<  y](e(VO)“1/2e(iVO  =  y] y^Aft)  <  Cl, 

&  & 


where  the  right-hand  side  does  not  depend  on  n.  Therefore,  to  prove  the  theorem  it 
is  sufficient  to  construct  a  function  \Is  <  g  which  is  convex  and  such  that  t  oo 

A 

as  a  |  oo. 

Define  the  function  f(x)  as  the  continuous  polygon  with  nodes  (Nk,  g(Nk  —  0)). 
Since 


g(Nk  -  0) 
Nk 


siNk-i)-^2 


monotonically  increases  as  k  grows,  xj/  is  a  lower  envelope  curve  for  the  discontinu¬ 
ous  function  g(x)  >  xf/(x).  The  monotonicity  of  follows  from  the  fact  that,  on 

JC 

the  interval  [Nk,  Nk+i),  this  function  can  be  represented  as 


f{x)  bk 

—  ktk,\fs  •> 
X  X 
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where  bk  >  0,  because  the  values  xfr{Nk+\  —  0)  and  g(Nk+i  —  0)  coincide,  while  the 
angular  incline  ak^  of  the  function  /  on  the  interval  [Nk,  Nk+ 1)  is  greater  than  the 
“radial”  incline  akig  of  the  function  g: 


g(Nk+i  -  0)  -  g(Nk)  ^  g(Nk+i  -  0)  -  g(Nk  -  0) 
Nk+i  -  Nk  Nk+i  -  Nk 


It  is  clear  that  increases  unboundedly,  for 

f(Nk)  g(Nk-  0)  ,,, 

=  - )_  =  }-t/2  t 

Nk  Nk 

as  k  — >►  oo.  The  theorem  is  proved. 


oo 


□ 


In  studying  the  mean  values  of  sums  of  random  variables,  the  following  theorem 
on  uniform  integrability  of  average  values ,  following  from  Theorem  6.1.5,  plays  an 
important  role. 


Theorem  6.1.6  Let  ,  §2 be  an  arbitrary  uniformly  integrable  sequence  of  ran¬ 
dom  variables , 

n  n 

Pi,n  —  0?  =  1?  Kn~  'y  '  \%i  I  Pi,n  • 

i  =  l  k=  1 


77z£n  the  sequence  {^n}  is  uniformly  integrable  as  well. 


Proof  Let  f(x)  be  the  convex  function  from  Theorem  6.1.5  satisfying  proper¬ 
ties  (6.1.2).  Then,  by  that  theorem, 

(n  \  n 

^PiVilSfl)  <E  y2iPi,n'lr(\%i\)  <C. 
i= 1  /  (=1 


It  remains  to  make  use  of  Theorem  6.1.5  again. 


□ 


Now  we  will  show  that  convergence  in  probability  together  with  uniform  inte¬ 
grability  imply  convergence  in  mean. 

p 

Theorem  6.1.7  Let  ->  §  and  {§ n }  be  uniformly  integrable.  Then  E|§|  exists  and , 
as  n  —>  00, 

(r) 

If  moreover. ;  { |  1 }  fv  uniformly  integrable  then  — >  §. 

(r) 

Conversely ,  //i/or  on  r  >  1,  — >  §  onJ  E|§|'  <00,  /7zen  {|^|r}  w  uniformly 

integrable. 


In  the  law  of  large  numbers  for  the  Bernoulli  scheme  (see  Theorem  5.1.1)  we 
proved  that  the  normed  sum  Sn/n  converges  to  p  in  probability.  Since  0  <  Sn/n  <  1, 
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Sn/n  is  clearly  uniformly  integrable  and  the  convergence  in  mean 
E| Sn/n  —  p\r  — >  0  holds  for  any  r.  This  fact  can  also  be  established  directly. 
For  a  more  substantiative  example  of  application  of  Theorems  6.1.6  and  6.1.7,  see 
Sect.  8.1. 


Proof  We  show  that  E§  exists.  By  the  properties  of  integrals  (see  Lemma  A3. 2. 3  in 
Appendix  3),  if  E|f  |  <  oo  then  E(f ;  An)  — >  0  as  P (An)  — >  0.  Since  E^  <  oo,  for 
any  A  and  s  one  has 

Emin(|£|,  N)  =  JimjEmin(|f  |,  N);  |£„  -£|  <  e] 

<  lim  Emin(|§|  +  s,  N)  <  c  +  e. 

n^oo  v  7 

It  follows  that  E|§  |  <  c. 

Further,  for  brevity,  put  rjn  =  |§w  —  §|.  Then  r\n  \  0  and  rjn  are  uniformly  inte¬ 
grable  together  with  §„ .  For  any  A  and  e,  one  has 

E fin  =E (rjn\  Tin  <  s)  +  E(rjn;  N  >T]n>  s)  +  E(rj„;  %  >  A) 

<  £  +  AP(^  >  e)  +  E(^;  rjn  >  A).  (6.1.3) 

Choose  A  so  that  sup7?  E(%;  rjn  >  N )  <  £.  Then,  for  such  an  A, 

lim  sup  E%  <  2s. 

n^oo 

Since  s  is  arbitrary,  E r\n  — >  0  as  n  — >►  oo. 

The  relation  E|£w  —  §|r  — >►  0  can  be  proved  in  the  same  way  as  (6.1.3),  since 

rfn  =  |£w  —  £|r  4o  and  7]^  are  uniformly  integrable  together  with  |£„|r. 

Now  we  will  prove  the  converse  assertion.  Let,  for  simplicity,  r  =  1.  One  has 

E(|£„|;|£b|>JV)<E(|&.-£|;  l&l  >  N)  +  E(|$|;  |£„|>JV) 

<E|£b-§|+E(|£|;  \£n\>N) 

<E|£b-§|+E(|£|;  |$b-£|>1)+E(|£|;  \$\>N-l). 

The  first  term  on  the  right-hand  side  tends  to  zero  by  the  assumption,  and  the  second 
term,  by  Lemma  A3. 2. 3  from  Appendix  3,  which  we  have  just  mentioned,  and  the 
fact  that  P(|£n  —  §|  >  1)  — >  0.  The  last  term  does  not  depend  on  n  and  can  be  made 
arbitrarily  small  by  choosing  A.  Theorem  6.1.7  is  proved.  □ 


Now  we  can  derive  yet  another  continuity  theorem  which  has  the  following  form. 


p 

Theorem  6.1.8  If  %n  §,  H(s)  satisfies  the  conditions  of  Theorem  6.1.4,  and 
H(fn)  is  uniformly  integrable ,  then ,  as  n  — >  oo, 


E  H^n)  -  (£)  ->  0 


and ,  in  particular ;  EA (§„)  — >  EA (§). 
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P 

This  assertion  follows  from  Theorems  6.1.4  and  6.1.7,  for  H(fn)  ->  //(§)  by 
Theorem  6.1.4. 

Sometimes  it  is  convenient  to  distinguish  between  left  and  right  uniform  integra- 
bility.  We  will  say  that  a  sequence  {%n }  is  right  (left)  uniformly  integrable  if 

sup E(f„ ;  >  N)  0  (supE(|f„|;  <  —N)  0) 

as  N  — >  oo.  It  is  evident  that  a  sequence  {§w}  is  uniformly  integrable  if  and  only  if 
it  is  both  right  and  left  uniformly  integrable. 

Lemma  6.1.1  A  sequence  {§ n }  is  right  uniform  integrable  if  at  least  one  of  the 
following  conditions  is  met : 

1.  For  any  sequence  N(n)  — >  oo  as  n  —>  oo,  one  has 

E(£„;  £„  >  NQt))  0. 

(TTzzv  condition  is  clearly  also  necessary  for  uniform  integrability.) 

2.  <  h,  where  Eq  <  00. 

3.  E(§+)1+<*  <  c  <  00  for  some  a  >  0  (here  =  max(0,  v)). 

p 

4.  is  left  uniformly  integrable ,  §,  E§„  — >►  E§  <00. 

Proof 

1.  If  the  sequence  {§„}  were  not  right  uniformly  integrable,  there  would  exist 
an  s  >  0  and  subsequences  n'  ->  00  and  A/7  =  N'(n')  — >  00  such  that  E(§„/; 
i;nr  >  N')  >  s.  But  this  contradicts  condition  1. 

2.  E(§n;  t;n  >  N)  <  E(q\  77  >  AO  — >  0  as  N  ->  00. 

3.  E(£„;  £„  >  A?)  <  E(^1+a'Af“a';  %„  >  N)  <  N~ac  -*  0  as  W  oo. 

4.  Without  loss  of  generality,  put  §  :=  0.  Then 

E(£„;  %n>  N)  =  Ei;n  -  E (£„;  £„  <  -V)  -  E(£„;  |£„|  <  V). 

The  first  two  terms  on  the  right-hand  side  vanish  as  n  — >  oo  for  any  A/"  = 
A^(fi)  — >  oo.  For  the  last  term,  for  any  s  >  0,  one  has 

|E(£„;  |£„|  <  V)|  <  |E(£„;  |£„|  <e)|  +  |E(£„;  s  <  |£„|  <  N)\ 

<  e  +  jVP(|£„|  >  e). 

For  any  given  e  >  0,  choose  an  n(s)  such  that,  for  all  n  >  n(e),  we  would  have 
P(l$»l  >s)<£,  and  put  N(s)  :=  [l/V^J*  This  will  mean  that,  for  all  n  >  n(£) 
and  A/-  <  N(s),  one  has  E(^n\  \^n  \  <  7/)  <  £  +  ^/e,  and  therefore  condition  1  of  the 
lemma  holds  for  E(fn  j  £ n  >  N).  The  lemma  is  proved.  □ 

Now,  based  on  the  above,  we  can  state  three  useful  corollaries. 

Corollary  6.1.3  (The  dominated  convergence  theorem)  If  -4-  §,  \i=n  \  <  q,  and 
Eq  <  oo  then  E§  exists  and  E^n  —>  E§. 
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Corollary  6.1.4  If  %  and  E|§w  |r+t*  <  c  <  oo  for  some  a  >  0  then  i-n  §. 

p 

Corollary  6.1.5  If  ^  and  H{x)  is  a  continuous  bounded  function ,  then 

E| H(£n)  —  EH (§)|  — >  0  as  n  — >  oo. 

In  conclusion  of  the  present  section,  we  will  derive  one  more  auxiliary  proposi¬ 
tion  that  can  be  useful. 

Lemma  6.1.2  (On  integrals  over  sets  of  small  probability)  If{^n)  is  a  uniformly  in¬ 
tegrate  sequence  and  {An}  is  an  arbitrary  sequence  of  events  such  that  P(A/2)  — >►  0, 
then  E(|£w|;  An)  — >  0  as  n  — >  oo. 

Proof  Put  Bn  \=  {\i;n\  <  N}.  Then 

E(|f„|;  An)  =  E(|£„|;  AnBn)+  E(|£„|;  AnBn) 

<WP(A„)  +  E(|t„|;  |$„|>JV). 

For  a  given  s  >  0,  first  choose  N  so  that  the  second  summand  on  the  right-hand  side 
does  not  exceed  s/2  and  then  an  n  such  that  the  first  summand  does  not  exceed  s/2. 
We  obtain  that,  by  choosing  n  large  enough,  we  can  make  E(|£w|;  An)  less  than  s. 
The  lemma  is  proved.  □ 


6.2  Convergence  of  Distributions 

In  Sect.  6.1  we  introduced  three  types  of  convergence  which  can  be  used  to  charac¬ 
terise  the  closeness  of  random  variables  given  on  a  common  probability  space.  But 
what  can  one  do  if  random  variables  are  given  on  different  probability  spaces  (or  if 
it  is  not  known  where  they  are  given)  which  nevertheless  have  similar  distributions? 
(Recall,  for  instance,  the  Poisson  or  de  Moivre-Laplace  theorems.)  In  such  cases 
one  should  be  able  to  characterise  the  closeness  of  the  distributions  themselves. 
Having  found  an  apt  definition  for  such  a  closeness,  in  many  problems  we  will  be 
able  to  approximate  the  required  but  hard  to  come  by  distributions  by  known  and, 
as  a  rule,  simpler  distributions. 

Now  what  distributions  should  be  considered  as  close?  We  are  clearly  looking 
for  a  definition  of  convergence  of  a  sequence  of  distribution  functions  Fn(x )  to  a 
distribution  function  F(x).  It  would  be  natural,  for  instance,  that  the  distributions 
of  the  variables  .  =  §  +  l/n  should  converge  to  that  of  §  as  n  — >  oo.  Therefore 
requiring  in  the  definition  of  convergence  that  supY  \Fn(x)  —  F(x)\  is  small  would 
be  unreasonable  since  this  condition  is  not  satisfied  for  the  distributions  of  §  +  l/n 
and  §  if  F(x)  =  P(§  <  x)  has  at  least  one  point  of  discontinuity. 

We  will  define  the  convergence  of  Fn  to  F  as  that  which  arises  when  one  consid¬ 
ers  convergence  in  probability. 
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Definition  6.2.1  We  will  say  that  distribution  functions  Fn  converge  weakly  to  a 
distribution  function  F  as  n  — >►  oo,  and  denote  this  by  Fn  =>  F  if,  for  any  continuous 
bounded  function  f(x). 


J  f(x)dFn(x) 


j  f  (x)dF(x). 


(6.2.1) 


Considering  the  distributions  F n(B)  and  F (B)  ( B  are  Borel  sets)  corresponding  to 
Fn  and  F,  we  say  that  Fn  converges  weakly  to  F  and  write  Fn  =>►  F.  One  can  clearly 
re-write  (6.2.1)  as 


/ 


f(x)Fn(dx) 


f 


f(x)¥(dx)  or  E/(^)^E/(§)  (6.2.2) 


(cf.  Corollary  6.1.5),  where  €=  Fn  and  §  €=  F. 


Another  possible  definition  of  weak  convergence  follows  from  the  next  assertion. 

Theorem  6.2.1  Fn  =>►  F  if  and  only  if  Fn(x)  —>  F(x)  at  each  point  of  continuity 
x  of  F . 


Proof  Let  (6.2.1)  hold.  Consider  an  e  >  0  and  a  continuous  function  f£(t)  which  is 
equal  to  1  for  t  <  x  and  to  0  for  t  >x  +  s,  and  varies  linearly  on  [x,  x  +  e].  Since 

Fn(x)=  f  fe(t)dFn(t)  < 

2  —  00 


by  virtue  of  (6.2.1)  one  has 

limsup  Fn(x)  < 

n^o o 


j  fs(t)dF(t)  <  F(x+s). 


If  v  is  a  point  of  continuity  of  F  then 


limsup Fn(x)  <  F(x) 

n^oo 

since  s  is  arbitrary. 

In  the  same  way,  using  the  function  f*(t)  =  f8(t  +  e),  we  obtain  the  inequality 


liminfF„(x)  >  F(x). 

n^oo 


We  now  prove  the  converse  assertion.  Let  —  M  and  N  be  points  of  continuity 
of  F  such  that  F(—M)  <  s/5  and  1  —  F(N)  <  s/5.  Then  Fn(—M )  <  s/4  and 
l-Fn(N)  <  s/4  for  all  sufficiently  large  n.  Therefore,  assuming  for  simplicity  that 
i/i  <  1 ,  we  obtain  that 


J  f  dFn  and 


(6.2.3) 


2In  many  texts  on  probability  theory  the  condition  of  the  theorem  is  given  as  the  definition  of  weak 
convergence.  However,  the  definition  in  terms  of  the  relation  (6.2.2)  is  apparently  more  appropriate 
for  it  continues  to  remain  valid  for  distributions  on  arbitrary  topological  spaces  (see,  e.g.  [1,  25]). 
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will  differ  from 


respectively,  by  less  than  s/2.  Construct  on  the  semi-interval  (— M,  N]  a  step  func¬ 
tion  fs  with  jumps  at  the  points  of  continuity  of  F  which  differs  from  /  by  less  than 
s/2.  Outside  (— M,  N]  we  set  f£  :=  0.  We  can  put,  for  instance, 


k 

fs(x )  :=^2f(xj)8j(x), 
7  =  1 


where  xo  =  —M  <  x\  <  •  •  •  <  Xk  =  N  are  appropriately  chosen  points  of  continu¬ 
ity  of  F,  and  8j(x)  is  the  indicator  function  of  the  semi-interval  (xj-\ ,  xj].  Then 
f  fsdFn  and  f  f£dF  will  differ  from  the  respective  integrals  in  (6.2.3),  for  suffi¬ 
ciently  large  n ,  by  less  than  s.  At  the  same  time, 


k 

fs  dF„  =  Yif(xJ)  [Fn  (x  j ) 

7=1 


fsdF. 


Since  s  >  0  is  arbitrary,  the  last  relation  implies  (6.2.1).  (Indeed,  one  just  has  to 
make  use  of  the  inequality 


lim  sup  J  f  d  Fn  <  s  +  lim  sup  J  f£d  Fn  —  £  +  J  fsdF<  2 s  +  /  fdF 
and  a  similar  inequality  for  lim  inf  f  f  dFn.)  The  theorem  is  proved.  □ 


/ 


/ 


/ 


For  remarks  on  different  and,  in  a  certain  sense,  simpler  proofs  of  the  second 
assertion  of  Theorem  6.2.1,  see  the  end  of  Sect.  6.3  and  Sect.  7.4. 


Remark  6.2.1  Repeating  with  obvious  modifications  the  above-presented  proof,  we 
can  get  a  somewhat  different  equivalent  of  convergence  (4):  convergence  of  differ¬ 
ences  Fn(y )  —  Fn(x)  F(y)  —  F  (x)  for  any  points  of  continuity  x  and  y  of  F . 

Remark  6.2.2  If  F(x)  is  continuous  then  convergence  F„  F  is  equivalent  to  the 
uniform  convergence  supv  \Fn(x)  —  F(x)\  — >  0. 

We  leave  the  proof  of  the  last  assertion  to  the  reader.  It  follows  from  the  fact 
that  convergence  Fn(x)  — >  F(x)  at  any  v  implies,  by  virtue  of  the  continuity  of  F, 
uniform  convergence  on  any  finite  interval.  The  uniform  smallness  of  Fn(x)  —  F(x) 
on  the  “tails”  is  ensured  by  the  smallness  of  F(x)  and  1  —  F(x). 

Remark  6.2.3  If  distributions  F„  and  F  are  discrete  and  have  jumps  at  the  same 
points  x\,X2, ...  then  F,?  =>>  F  will  clearly  be  equivalent  to  the  convergence  of  the 
probabilities  of  the  values  x\,X2, ...  (Fn(xk  +  0)  —  Fn(xk)  — >  F(xk  +  0)  —  F(xk)). 
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We  introduce  some  notation  which  will  be  convenient  for  the  sequel.  Let  and 
§  be  some  random  variables  (given,  generally  speaking,  on  different  probability 
spaces)  such  that  €=  ¥n  and  §  F. 

Definition  6.2.2  If  Fn  =>►  F  we  will  say  that  converges  to  §  m  distribution  and 
write  . 

We  used  here  the  same  symbol  =>-  as  for  the  weak  convergence,  but  this  leads  to 
no  confusion. 

It  is  clear  that  §  implies  =>-  § ,  but  not  vice  versa. 

At  the  same  time  the  following  assertion  holds  true. 

Lemma  6.2.1  If  =>  §  (Fw  =>  F)  can  construct  random  variables  §77 

anr/  §'  on  a  common  probability  space  so  that  P(§/7  <  v)  =  P(^n  <  x)  =  Fn(x ), 
P(§r  <  x)  =  P(§  <  x)  =  F(v),  an<i 


Proof  Define  the  quantile  transforms  (see  Definition  3.2.6)  by 

F~l(t)  :=  sup{v  :  Fn(x)  <  t },  F~l(t)  :=  supjv  :  F(x)  <  f}. 

(If  F(jc)  is  continuous  and  strictly  increasing  then  F~l(t)  coincides  with  the  solu¬ 
tion  to  the  equation  F(v)  =  t.)  Let  r/  ^  Uo,i-  Put 

a  s 

(cf.  Theorem  3.2.2),  and  show  that  %'n  —A  In  order  to  do  that,  it  suffices  to  prove 
that  F~l(y)  — >  F~l(y)  for  almost  all  y  e  [0,  1]. 

The  functions  F  and  F~{  are  monotone  and  hence  each  of  them  has  at  most 
a  countable  set  of  discontinuity  points.  This  means  that,  for  all  y  e  [0,  1]  with  the 
possible  exclusion  of  the  points  from  a  countable  set  T,  the  function  F~l(y)  will 
be  continuous. 

So  let  y  be  a  point  of  continuity  of  and  F(_1)(y)  =x. 

For  t  <  y,  choose  a  continuous  strictly  increasing  function  G(_1)(f)  such  that 

G(_1)(y)  =  F(_1)(y),  G(_1)(f)  <  F{~l\t)  for  t  <  y. 

Denote  by  G(v),  v  <  x,  the  function  inverse  to  G^_1^(f).  Clearly,  G(v)  domi¬ 
nates  the  function  F(v)  in  the  domain  v  <  x.  By  virtue  of  the  continuity  and  strict 
monotonicity  of  the  functions  G(_1)  and  G  (in  the  domain  under  consideration),  for 
£  >0  we  have 


G(x  -e)  =  y  -8(e), 


where  8(s)  >  0,  8(e)  ->  0  as  e  — >  0.  Choose  an  e  such  that  x  —  e  is  a  point  of 
continuity  of  F .  Then,  for  all  n  large  enough, 


Fn(x  -e)<  F(x 


e)  + 


m 


<  G(x  —  s)  + 


m 


m 


2 


2 


=  y- 


2 
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The  opposite  inequality  can  be  proved  in  a  similar  way.  Since  s  can  be  arbitrarily 
small,  we  obtain  that,  for  almost  all  y, 

Fnl(y)^Fi~l)(y)  as  n  ^  oo. 

Hence  F„\r])  — >  with  probability  1  with  respect  to  the  distribution  of  r] . 

The  lemma  is  proved.  □ 

Lemma  6.2.1  remains  true  for  vector- valued  random  variables  as  well. 
Sometimes  it  is  also  convenient  to  have  a  simple  symbol  for  the  relation  “the 
distribution  of  converges  weakly  to  F”.  We  will  write  this  relation  as 

&  ©►  F,  (6.2.4) 

so  that  the  symbol  ^expresses  the  same  fact  as  =>  but  relates  objects  of  a  different 
nature  in  the  same  way  as  the  symbol  ^  in  the  relation  §  €=  P  (on  the  left-hand 
side  in  (6.2.4)  we  have  random  variables,  while  on  the  right  hand  side  there  is  a 
distribution). 

In  these  terms,  the  assertion  of  the  Poisson  theorem  could  be  written  as  Sn  II  ^ 
while  the  statement  of  the  law  of  large  numbers  for  the  Bernoulli  scheme  takes  the 
form  Sn/n  ©►  lp. 

The  coincidence  of  the  distributions  of  §  and  r\  will  be  denoted  by  §  =  rj. 

p 

Lemma  6.2.2  If  =>>  §  and  sn  ->  0  then  %n  +  sn  =>  % . 

p 

IfHn  =>•  %  and  Yn  ->  1  then  %nyn  =>•  £. 

Proof  Let  us  prove  the  first  assertion.  For  any  t  and  8  >  0  such  that  t  and  t  ±  8  are 
points  of  continuity  of  P(£  <  t  ),  one  has 

limsupP(f„  +  e„  <t)  =  limsupP(f„  +  en  <  t,  en  >  —8) 
n — >  oo  n — >  oo 

<  limsupP(^  <  t  +  5)  =  P(§  <  t  +  5). 

n^o o 

Similarly, 


liminfP(^  +  sn  <  t)  >  P(§  <t  —  8). 

n  — >  oo 

Since  P(£  <  t  zb  5)  can  be  chosen  arbitrary  close  to  P(§  <  t)  by  taking  a  sufficiently 
small  8 ,  the  required  convergence  follows. 

The  second  assertion  can  be  proved  in  the  same  way.  The  lemma  is  proved.  □ 

Now  we  will  give  analogues  of  Theorems  6.1.4  and  6.1.7  in  terms  of  distribu¬ 
tions. 

Theorem  6.2.2  If  =>►  §  and  a  function  H(s )  satisfies  the  conditions  of  Theo¬ 
rem  6.1.4  then  H(fin)  => 
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Theorem  6.2.3  If  %n  =>  §  and  the  sequence  {§ n }  is  uniformly  integrable  then  E§ 
exists  and  E%n  —>  E§ . 


Proof  There  are  two  ways  of  proving  these  theorems.  One  of  them  consists  of  re¬ 
ducing  them  to  Theorems  6.1.4  and  6.1.7.  To  this  end,  one  has  to  construct  random 
variables  §'„  =  Fn~l\rj)  and  ^'  =  where  rj  €=  Uo,i  and  F^~1>}  and 

are  the  quantile  transforms  of  Fn  and  F,  respectively,  and  prove  that  — >►  §'  (we 
already  know  that  F^~l\rj)  ^  F;  if  F  is  discontinuous  or  not  strictly  increasing, 
then  F^~1^  should  be  defined  as  in  Lemma  6.2.1). 

Another  approach  is  to  prove  the  theorems  anew  using  the  language  of  distri¬ 
butions.  Under  inessential  additional  assumptions,  such  proofs  are  sometimes  even 
simpler.  To  illustrate  this,  assume,  for  instance,  in  Theorem  6.2.3  that  the  function 
H  is  continuous.  One  has  to  prove  that  Eg (H (§„))  — >►  Eg (//(§))  for  any  continuous 
bounded  function  g.  But  this  is  an  immediate  consequence  of  (6.2.1)  and  (6.2.2),  for 
f  =  g  o  H  (f  is  the  composition  of  the  functions  g  and  H). 

In  Theorem  6.2.3  assume  that  Hn  2  0  (this  does  not  restrict  the  generality).  Then, 
integrating  by  parts,  we  get 

poo  poo 

E^n  =  -  /  xdP(i=n>x)=  /  p (%n>x)dx.  (6.2.5) 

Jo  Jo 

Since  by  virtue  of  uniform  integrability 

poo 

sup  /  P (£„  >x)dx<  sup  E(£„ ;  0 

n  J  N  n 


as  N  — >  oo,  the  integral  in  (6.2.5)  is  uniformly  convergent.  Moreover,  P(£„  >  x) 
P($  >x)  a.s.,  and  therefore 


lim  E$n  =  lim 

n^oo  n^oo 


P (fn  >x)dx  = 


P(§  >x)dx  =  E§. 


□ 


Conditions  ensuring  uniform  integrability  are  contained  in  Lemma  6.1.1.  Now 
we  will  give  a  modification  of  assertion  4  of  this  lemma  for  the  case  of  weak  con¬ 
vergence. 


Lemma  6.2.3  If  {§ n }  is  left  uniformly  integrable ,  =>>  §  and  E^n  — >  E§  then  {^n } 

is  uniformly  integrable. 


We  suggest  to  the  reader  to  construct  examples  showing  that  all  three  conditions 
of  the  lemma  are  essential. 

Lemma  6.2.3  implies,  in  particular,  that  if  %n  >  0,  =>  §  and  E^n  ->  E§  then 

{^n}  is  uniformly  integrable. 

As  for  Theorems  6.2.2  and  6.2.3,  two  alternative  ways  to  prove  the  result  are 
possible  here.  One  of  them  consists  of  using  Lemma  6.1.1.  We  will  present  here  a 
different,  somewhat  simpler,  proof. 
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Proof  of  Lemma  6.2.3  For  simplicity  assume  that  £ n  —  0-  Suppose  that  the  lemma 
is  not  valid.  Then  there  exist  an  e  >  0  and  subsequences  n'  — >  oo  and  N(n')  — >  oo 
such  that 


E (£„*;  >  V(«'))  >  e- 

Since 


Ei Hn'  =  E (£„/;  <  AO  +  E(£„/;  >  AO, 

for  any  V  that  is  a  point  of  continuity  of  the  distribution  of  £ ,  one  has 

Ef  =  lim  £„/  >  E(£;  §  <  V)  +  e. 

>oo 

Choose  an  N  such  that  the  first  summand  on  the  right-hand  side  exceeds  E§  —  e/2. 
Then  we  obtain  the  contradiction  E§  >  E§  +  e/2,  which  proves  the  lemma. 

We  leave  it  to  the  reader  to  extend  the  proof  to  the  case  of  arbitrary  left  uniformly 
integrable  {§„}.  □ 

The  following  theorem  can  also  be  useful. 


Theorem  6.2.4  Suppose  that  =>>  §,  H{s)  is  differentiable  at  a  point  a ,  and 
bn  —>  0  as  n  — >  oo.  77zc/i 

T(//(a  +  bnfn)  -  H{a))  =► 
bn 

If  H\a)  =  0  and  H"(a)  exists  then 


75 (H (a  +  bnfn )  -  //(a))  =>► 


H"  (a). 


Proof  Consider  the  function 


h(x)  = 


H(a+x)-H(a ) 

JC 

Hfa)  if  v  =  0, 


which  is  continuous  at  the  point  v  =  0.  Since  bnfn  =>>  0,  by  Theorem  6.2.2  one  has 
h(bn^n)  =>  /z(0)  =  Hfa).  Using  the  theorem  again  (this  time  for  two-dimensional 
distributions),  we  get 


H(a  +  bn  ^ n ) 


H(a) 


=  h(bnfn)%n  =>-  H\a)%. 


The  second  assertion  is  proved  in  the  same  way. 


□ 


A  multivariate  analogue  of  this  theorem  will  look  somewhat  more  complicated. 
The  reader  could  obtain  it  himself,  following  the  lines  of  the  argument  proving  The¬ 
orem  6.2.4. 
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6.3  Conditions  for  Weak  Convergence 

Now  we  will  return  to  the  concept  of  weak  convergence.  We  have  two  criteria  for  this 
convergence:  relation  (6.2.1)  and  Theorem  6.2.1.  However,  from  the  point  of  view 
of  their  possible  applications  (their  verification  in  concrete  problems)  both  these 
criteria  are  inconvenient.  For  instance,  proving,  say,  convergence  E /(£„)  — >  E /(£) 
not  for  all  continuous  bounded  functions  /  but  just  for  elements  /  of  a  certain  rather 
narrow  class  of  functions  that  has  a  simple  and  clear  nature  would  be  much  easier. 
It  is  obvious,  however,  that  such  a  class  cannot  be  very  narrow. 

Before  stating  the  basic  assertions,  we  will  introduce  a  few  concepts. 

Extend  the  class  3  of  all  distribution  functions  to  the  class  9  of  all  functions 
G  satisfying  conditions  FI  and  F2  from  Sect.  3.2  and  conditions  G(— oo)  >  0, 
G(oo)  <  1.  Functions  G  from  9  could  be  called  generalised  distribution  functions. 
One  can  think  of  them  as  distribution  functions  of  improper  random  variables  as¬ 
suming  infinite  values  with  positive  probabilities,  so  that  G(— oo)  =  P(§  =  —  oo) 
and  1  —  G(oo)  =  P(§  =  oo).  We  will  write  Gn  =>  G  for  Gn  e  9  and  G  e  S  if 
Gn(x)  — >  G(x)  at  all  points  of  continuity  of  G(x). 

Theorem  6.3.1  (Helly)  The  class  9  is  compact  with  respect  to  convergence  =>►, 
i.e.  from  any  sequence  {Gn},  Gn  e  9,  one  can  choose  a  convergent  subsequence 


Gnjc  ^Ge  S. 


For  the  proof  of  Theorem  6.3.1,  see  Appendix  4. 

Corollary  6.3.1  If  each  convergent  subsequence  {Gnk}  of  {Gn}  with  Gn  e  9  con¬ 
verges  to  G  then  Gn  =>  G. 

Proof  If  Gn  G  then  there  exists  a  point  of  continuity  vo  of  G  such  that  Gn  (vo)  A 
G(x o).  Since  Gn(x o)  e  [0,  1],  there  exists  a  convergent  subsequence  Gnk  such 
that  Gnk(x o)  — >  g  7^  G(x o).  This,  however,  is  impossible  by  our  assumption,  for 

Gnk(xo)  ->  G(xo).  □ 

The  reason  for  extending  the  class  T  of  all  distribution  functions  is  that  it  is  not 
compact  (in  the  sense  of  Theorem  6.3.1)  and  convergence  Fn  =>-  G,  Fn  e  T,  does 
not  imply  that  G  e  T.  For  example,  the  sequence 


(6.3.1) 


converges  everywhere  to  the  function  G(x)  =  1/2  ^  T  corresponding  to  an  improper 
random  variable  taking  the  values  ±oo  with  probabilities  1/2. 

However,  dealing  with  the  class  9  is  also  not  very  convenient.  The  fact  is  that 
convergence  at  points  of  continuity  Gn  =>  G  in  the  class  9  is  not  equivalent  to 
convergence 
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(see  example  (6.3.1)  for  /  =  1),  and  the  integrals  f  fdG  do  not  specify  G  uniquely 
(they  specify  the  increments  of  G,  but  not  the  values  G(— oo)  and  G(oo)).  Now  we 
will  introduce  two  concepts  that  will  help  to  avoid  the  above-mentioned  inconve¬ 
nience. 


Definition  6.3.1  A  sequence  of  distributions  {¥n}  (or  distribution  functions  {Fn}) 
is  said  to  be  tight  if,  for  any  e  >  0,  there  exists  an  A  such  that 

infF n([—N,  A])  >  1  —  e.  (6.3.2) 

n  v  7 


Definition  6.3.2  A  class  £  of  continuous  bounded  functions  is  said  to  be  distribu¬ 
tion  determining  if  the  equality 


/ 


/ 


/ (x)dF(x)  =  /  / (x)dG(x), 


Fe2 F,  GeS, 


for  all  /  e  £  implies  that  F  =  G  (or,  which  is  the  same,  if  the  relation  E /(§)  = 
E  f(rj)  for  all  /  e  £,  where  one  of  the  random  variables  §  and  rj  is  proper,  implies 

that  §  =  q). 


The  next  theorem  is  the  main  result  of  the  present  section. 


Theorem  6.3.2  Let  £  be  a  distribution  determining  class  and  {Fn}  a  sequence  of 
distributions.  For  the  existence  of  a  distribution  F  e  T  such  that  Fn  =>►  F  it  is  nec¬ 
essary  and  sufficient  that:3 

(1)  the  sequence  {Fn}  is  tight ;  and 

(2)  lim^oo  f  fdFn  exists  for  all  f  e  £. 


Proof  The  necessary  part  is  obvious. 

Sufficiency.  By  Theorem  6.3.1  there  exists  a  subsequence  Fnk  =>►  F  e  S-  But 
by  condition  (1)  one  has  F  e  9\  Indeed,  if  v  >  A  is  a  point  of  continuity  of  A 
then,  by  Definition  6.3.1,  F(jc)  =  lim  Fnk(x)  >  1  —  £.  In  a  similar  way  we  establish 
that  for  v  <  —  A  one  has  F(x)  <  s.  Since  e  is  arbitrary,  we  have  F(— oo)  =  0  and 
F(oo)  =  1. 

Further,  take  another  convergent  subsequence  Fn>  =>  G  e  T.  Then,  for  any 

k 

f  e  £ ,  one  has 


lim 


(6.3.3) 


But,  by  condition  (2), 


(6.3.4) 


and  hence  F  =  G.  The  theorem  is  proved  by  virtue  of  Corollary  6.3.1.  □ 


3  In  this  form  the  theorem  persists  for  spaces  of  a  more  general  nature.  The  role  of  the  segments 
[—A,  N ]  in  (6.3.2)  is  played  in  that  case  by  compact  sets  (cf.  [1,  14,  25,  31]). 
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Fig.  6.1  The  plot  of  the 
function  fa,sM  from 
Example  6.3.1 


0  a  a+s 


If  one  needs  to  prove  convergence  to  a  “known”  distribution  F  e  5F,  the  tightness 
condition  in  Theorem  6.3.2  becomes  redundant. 


Corollary  6.3.2  Let  L  be  a  distribution  determining  class  and 

i,., 


(6.3.5) 


for  any  f  e  £.  Moreover ;  assume  that  at  least  one  of  the  following  three  conditions 
is  met : 

(1)  the  sequence  {Fn}  is  tight ; 

(2)  F  e  T; 

(3)  /  =  1  c  L  (i.e.  (6.3.5)  holds  for  f  =  1). 

Then  F  e  5F  and  Fn  =>►  F . 

The  proof  of  the  corollary  is  almost  next  to  obvious.  Under  condition  (1)  the  as¬ 
sertion  follows  immediately  from  Theorem  6.3.2.  Condition  (3)  and  convergence 
(6.3.5)  imply  condition  (2).  If  (2)  holds,  then  F  e  T  in  relations  (6.3.3)  and  (6.3.4), 
and  therefore  G  =  F .  □ 

Since,  as  a  rule,  at  least  one  of  conditions  (l)-(3)  is  satisfied  (as  we  will  see 
below),  the  basic  task  is  to  verify  convergence  (6.3.5)  for  the  class  L. 

Note  also  that,  in  the  case  where  one  proves  convergence  to  a  distribution  Fe? 
“known”  in  advance,  the  whole  arrangement  of  the  argument  can  be  different  and 
simpler.  One  such  alternative  approach  is  presented  in  Sect.  7.4. 

Now  we  will  give  several  examples  of  distributions  determining  classes  L. 

Example  6.3.1  The  class  Lq  of  functions  having  the  form 


1  if  x  <a, 


fa,e(x )  =  ' 


0  if  x  >a  +  s. 


On  the  segment  [ a ,  a  +  s]  the  functions  fa,e  are  defined  to  be  linear  and  continuous 
(a  plot  of  fa, six )  is  given  in  Fig.  6.1).  It  is  a  two-parameter  family  of  functions. 

We  show  that  Lq  is  a  distribution  determining  class.  Let 


for  all  /  e  /Cq.  Then 
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and,  conversely, 

G(a)  <  F(a  +  s ) 

for  any  s  >  0.  Taking  a  to  be  a  point  of  continuity  of  both  F  and  G,  we  obtain  that 

F(a)  =  G(a). 

Since  this  is  valid  for  all  points  of  continuity,  we  get  F  =  G. 

One  can  easily  verify  in  a  similar  way  that  the  class  Co  of  “trapezium-shaped” 
functions  f(x)  =  min (fa,s,  1  —  fb,s)>  a  <b,  is  also  distribution  determining. 


Example  6.3.2  The  class  J2\  of  continuous  bounded  functions  such  that,  for  each 
/  G  /Co  (or  /  g  /Co),  there  exists  a  sequence  fn  e  £\,  supr  \f(x)\  <  M  <  oo,  for 
which  lim^^oo  fn(x )  =  f(x)  for  each  igI. 

Let 

fdF  =  J  fdG 


for  all  /  G  L  i .  By  the  dominated  convergence  theorem, 


dm  J 


f 


mj 


lim  /  fndF  =  f  dF,  lim  fndG=  fdG,  fez  o 


/ 


Therefore 


f/‘iF=f 


fdF=  fdG,  feC  o,  ^=S 


and  hence  is  a  distribution  determining  class. 


Example  6.3.3  The  class  Ck  of  all  bounded  functions  f(x)  having  bounded  uni¬ 
formly  continuous  k-th  derivatives  f^k\x)  (supr  |/^(x)|  <  oo),  k  >  1. 

It  is  evident  that  Ck  is  a  distribution  determining  class  for  it  is  a  special  case  of 
an  /Ci  class. 

In  the  same  way  one  can  see  that  the  subclass  c  Ck  of  functions  having  fi¬ 
nite  support  (vanishing  outside  a  finite  interval)  is  also  distribution  determining. 
This  follows  from  the  fact  that  is  an  -class  with  respect  to  the  class  /Co  of 
trapezium- shaped  (and  therefore  having  compact  support)  functions. 

It  is  clear  that  the  class  Ck  satisfies  condition  (3)  from  Corollary  6.3.2  (/  = 
1  G  Ck).  Therefore,  to  prove  convergence  Fn  =>►  F  G  T  it  suffices  to  verify  conver¬ 
gence  (6.3.5)  for  /  e  Ck  only. 

If  one  takes  L  to  be  the  class  of  differentiable  functions  with  finite  sup¬ 
port  then  relation  (6.3.5)  together  with  condition  (2)  of  Corollary  6.3.2  could  be 
re-written  as 

J  Fnf'dx —>  J  Ff'dx,  F  g  9\  (6.3.6) 

(One  has  to  integrate  (6.3.5)  by  parts  and  use  the  fact  that  f'  also  has  a  finite  sup¬ 
port.)  The  convergence  criterion  (6.3.6)  is  sometimes  useful.  It  can  be  used  to  show, 
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for  example,  that  (6.3.5)  follows  from  convergence  Fn(x)  — >  F(x )  at  all  points  of 
continuity  of  F  (i.e.  almost  everywhere),  since  that  convergence  and  the  dominated 
convergence  theorem  imply  (6.3.6)  which  is  equivalent  to  (6.3.5). 

Example  6.3.4  One  of  the  most  important  distribution  determining  classes  is  the 
one-parameter  family  of  complex-valued  functions  {eltx},  t  e  R. 

The  next  chapter  will  be  devoted  to  studying  the  properties  of  f  eltxdF(x). 

After  obvious  changes,  all  the  material  in  the  present  chapter  can  be  extended  to 
the  multivariate  case. 


Chapter  7 

Characteristic  Functions 


Abstract  Section  7.1  begins  with  formal  definitions  and  contains  an  extensive  dis¬ 
cussion  of  the  basic  properties  of  characteristic  functions,  including  those  related  to 
the  nature  of  the  underlying  distributions.  Section  7.2  presents  the  proofs  of  the  in¬ 
version  formulas  for  both  densities  and  distribution  functions,  and  also  in  the  space 
of  square  integrable  functions.  Then  the  fundamental  continuity  theorem  relating 
pointwise  convergence  of  characteristic  functions  to  weak  convergence  of  the  re¬ 
spective  distributions  is  proved  in  Sect.  7.3.  The  result  is  illustrated  by  proving  the 
Poisson  theorem,  with  a  bound  for  the  convergence  rate,  in  Sect.  7.4.  After  that, 
the  previously  presented  theory  is  extended  in  Sect.  7.5  to  the  multivariate  case. 
Some  applications  of  characteristic  functions  are  discussed  in  Sect.  7.6,  including 
the  stability  properties  of  the  normal  and  Cauchy  distributions  and  an  in-depth  dis¬ 
cussion  of  the  gamma  distribution  and  its  properties.  Section  7.7  introduces  the 
concept  of  generating  functions  and  uses  it  to  analyse  the  asymptotic  behaviour 
of  a  simple  Markov  discrete  time  branching  process.  The  obtained  results  include 
the  formula  for  the  eventual  extinction  probability,  the  asymptotic  behaviour  of  the 
non-extinction  probabilities  in  the  critical  case,  and  convergence  in  that  case  of  the 
conditional  distributions  of  the  scaled  population  size  given  non-extinction  to  the 
exponential  law. 


7.1  Definition  and  Properties  of  Characteristic  Functions 

As  a  preliminary  remark,  note  that  together  with  real- valued  random  variables  §  ( co ) 
we  could  also  consider  complex- valued  random  variables,  by  which  we  mean  func¬ 
tions  of  the  form  (co)  +  /§ 2(<z>),  (§i,  §2)  being  a  random  vector.  It  is  natural  to 
put  E(£i  +  /§ 2)  =  E£i  +  i E§2-  Complex- valued  random  variables  §  =  +  /§ 2  and 

7)  =  f)\  +  if]2  are  independent  if  the  o -algebras  cr(£i,  §2)  and  <7(171, 172)  generated 
by  the  vectors  (§1,  §2)  and  (171, 172),  respectively,  are  independent.  It  is  not  hard  to 
verify  that,  for  such  random  variables, 

=  E§E?7. 

Definition  7.1.1  The  characteristic  function  (ch.f.)  of  a  real- valued  random  variable 
§  is  the  complex- valued  function 
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where  t  is  real. 


<p^{t)  :=Eeil ^ 


Se‘“dFw' 


If  the  distribution  function  F(x)  has  a  density  f(x)  then  the  ch.f.  is  equal  to 


eltx  f{x)  dx 


and  is  just  the  Fourier  transform  of  the  function  f(x).  In  the  general  case,  the  ch.f. 
is  the  Fourier-Stieltjes  transform  of  the  function  F(x). 

The  ch.f.  exists  for  any  random  variable  §.  This  follows  immediately  from  the 
relation 


dF(x)  < 


1  dF(x)  =  1. 


Ch.f.s  are  a  powerful  tool  for  studying  properties  of  the  sums  of  independent  random 
variables. 


7.1.1  Properties  of  Characteristic  Functions 

1 .  For  any  random  variable  § , 

(p%  (0)  =  1  and  \(p%(t)  |  <  1  for  all  t. 
This  property  is  obvious. 

2.  For  any  random  variable  § , 

<PaS;+b(t)  =  eltbip!;{ta). 

Indeed, 

(Pas+bit)  =  Eeit(aS+b)  =  ei,bEeiat$  =  eitb<p^(ta). 


^ore  precisely,  in  classical  mathematical  analysis,  the  Fourier  transform  (p(t)  of  a  function  fit) 
from  the  space  L\  of  integrable  functions  is  defined  by  the  equation 

<p(t)  =  -±=  [  eitx  f  (t)dt 

(the  difference  from  ch.f.  consists  in  the  factor  1  /VZ7r).  Under  this  definition  the  inversion  formula 
has  a  “symmetric”  form:  if  (p  e  L\  then 

f(x)  =  -J=  J  e~ltx(p(t)dt. 

This  representation  is  more  symmetric  than  the  inversion  formula  for  ch.f.  (7.2.1)  in  Sect.  7.2 
below. 
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3.  If  £i, . . . ,  are  independent  random  variables  then  the  ch.fi  of  the  sum  Sn  = 
+  •••  +  §«  w  egaa/  to 


<Ps„(t)  =  <p ?1 


Proof  This  follows  from  the  properties  of  the  expectation  of  the  product  of  inde¬ 
pendent  random  variables.  Indeed, 


^  (?)  =  =  Ee'^1  e!'rfe  •  •  • 

=  Ee^1Ee,',fe-.-Ee^"  =<ps1(t)<ph(t)  ■  ■ □ 


Thus  to  the  convolution  F ^  there  corresponds  the  product  cp^2 . 


4.  77re  ch.fi  cp%(t)  is  a  uniformly  continuous  function. 
Indeed,  as  h  — >  0, 


<^  +  /z)-<p(0|  =  |E(^'a+w  -^)|  <E|^^  -  1 


0 


by  the  dominated  convergence  theorem  (see  Corollary  6.1.2)  since  \elh^ 
as  h  —>  0,  and  \elh ^  —  1 1  <  2. 


-  1 


■*0 

□ 


5.  If  the  k-th  moment  exists :  E|£  <  oo,  k  >  1,  f/t^re  exists  a  continuous  k-th 

derivative  of  the  function  cp%(t),  and  cp^k\ 0)  =  /^E^. 


Proof  Indeed,  since 


/■ 


ixeltx  dF(x) 


/ 


<  /  |v| dF(x)  =  E|§|  <  oo, 


the  integral  f  ixeltx  dF(x)  converges  uniformly  in  f.  Therefore  one  can  differentiate 
under  the  integral  sign: 

cp\t)  =  i  J  xeltx  dF(x),  ip' (0)  =  z'E§ . 

Further,  one  can  argue  by  induction.  If,  for  l  <  k. 


then 

by  the  uniform  convergence  of  the  integral  on  the  right-hand  side.  Therefore 


<p(,+1>(0)  =  il+1Ep+l 


□ 
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Property  5  implies  that  if  E|£ \k  <  oo  then,  in  a  neighbourhood  of  the  point  t  =  0, 
one  has  the  expansion 

=  {  +  +o{\tk\).  (7.1.1) 

7  =  1  J • 

The  converse  assertion  is  only  partially  true: 


If  a  derivative  of  an  even  order  exists  then 

E|£|2*<oo,  <p(m{  0)  =  (-l)kE$2k. 

We  will  prove  the  property  for  k  =  1  (for  k  >  1  one  can  employ  induction).  It 
suffices  to  verify  that  E|§  |2  is  finite.  One  has 


2<p(0)  -  cp(2h)  -  <p(-2h) 
4b2 


=  Ei 


eih%  —  e~ih%  \  2 


2  h 


=  E 


sin2  h § 
h2 


Since  h  2  sin2  h §  — >  §2  as  /z  — >  0,  by  Fatou’s  lemma 


—cpff(0)  =  lim 


h^0\ 


/2<p(0)-<p(2h)-<p(-2h) 


4  h2 


sin2  /z£ 

=  lim  E - - — 

/z — >0  /z2 


sin2  /z£  9 

>  E  lim - ^  =E§2. 

“  /i^o  /z2  s 


□ 


6.  //$  >  0  then  (A)  is  defined  in  the  complex  plane  for  ImA  >  0.  Moreover, 
|<^(A)|  <  1  /rzr  such  X,  and  in  the  domain  ImA  >  0,  <^(A)  zs  analytic  and  con¬ 
tinuous  including  on  the  boundary  Im  A  =  0. 


Proof  That  cp{ A)  is  analytic  follows  from  the  fact  that,  for  ImA  >  0,  one  can  differ¬ 
entiate  under  the  integral  sign  the  right-hand  side  of 

poo 

(p%(. A)  =  /  elXxdF(x). 

Jo 

(For  ImA  >  0  the  integrand  decreases  exponentially  fast  as  v  — >  oo.)  □ 

Continuity  is  proved  in  the  same  way  as  in  property  4.  This  means  that  for  non¬ 
negative  §  the  ch.f.  (p%  (A)  uniquely  determines  the  function 

\j/(s)  =  (p%(is)  =  Ee~s ^ 

of  real  variable  s  >  0,  which  is  called  the  Laplace  (or  Laplace-Stieltjes )  transform 
of  the  distribution  of  § . 

The  converse  assertion  also  follows  from  properties  of  analytic  functions:  the 
Laplace  transform  f(s)  on  the  half-line  s  >  0  uniquely  determines  the  ch.f  (p%( A). 

7.  cp^(t)  =  (p%(—t)  =  (p-%(t),  where  the  bar  denotes  the  complex  conjugate. 
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Proof  The  relations  follow  from  the  equalities 

v$(t)  =  Ee w  =  E~e^  =  Ee~ it% . 


□ 


This  implies  the  following  property. 


7A.  If  §  is  symmetric  ( has  the  same  distribution  as  — § )  then  its  ch.f  is  real  (<pg  (t)  = 

One  can  show  that  the  converse  is  also  true;  to  this  end  one  has  to  make  use  of 
the  uniqueness  theorem  to  be  discussed  below. 

Now  we  will  find  the  ch.f.s  of  the  basic  probability  laws. 

Example  7.1.1  If  §  =  a  with  probability  1,  i.e.  §  ^  la,  then  <p%(t)  =  elta . 

Example  7.1.2  If  §  then  cp^(t)  =  pelt  +  (!—/?)  =  !+  p(elt  —  1). 


Example  7.1.3  If  §  ^  <I>o,i  then  (p%(t)  =  e  r/2. 

Indeed, 

(pit)  =  cpt{t)  =  . _  /  eltx~x^2dx. 

w2tt  J-o o 

2  2 

Differentiating  with  respect  to  t  and  integrating  by  parts  ( xe~x  /2  dx  =  —  de~x  /2), 
we  get 


<p\t)  = 


-}=[i 
v  2tt  J 


ixeitx-x2,2dx 


J-f 

\j~7jz  J 


te 


itx—x2/2  i  _ 


dx  ~  —t(p(t). 


(in  cp(t))'  =  -t, 


V 


ln^(0  =  +c. 


Since  cp(0)  =  1,  one  has  c  =  0  and  cp(t)  =  e  r/2. 


□ 


Now  let  ij  be  a  normal  random  variable  with  parameters  (a,  cr).  Then  it  can  be 
represented  as  rj  =  -\-  a,  where  §  is  normally  distributed  with  parameters  (0,  1). 

The  ch.f.  of  rj  can  be  found  using  Property  2: 

cpn(t)  =  =  eita-t2a2/ 2. 

Differentiating  <pv(t)  for  i]  €=  a2,  we  will  obtain  that  Erf  =  0  for  odd  k,  and 

E r\k  =  ak(k  -  1  )(k  -  3)  •  •  •  1  for  k  =  2, 4, . . . . 


Example  7.1.4  If  §  ^  II M  then 


n(t)=Eeit^  =  YJ 


itkEi - 

k\ 


11 


=  e~ 11  J2  Iff  =  e-^e^"  =  exp[/r(e!f  -  l)]. 
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Example  7.1.5  If  §  has  the  exponential  distribution  Ta  with  density  ae  ax  for 
x  >  0,  then 

r°°  .  (x 

<pt  (0=0  /  el,x~ax  dx  = - . 

Jo  a -it 

Therefore,  if  £  has  the  “double”  exponential  distribution  with  density  \e  1*1,  —oo  < 
x  <  oo,  then 


1/1  1 

<P%  (t )  —  o  I  t - : — h 


2  V  1  —  it  1  ~F  it 


If  §  has  the  geometric  distribution  P(§  =  k)  =  (1  — 


1 

r+72- 

p)pk ,  k  =  0, 1, ... ,  then 


= 


1  ~  P 
1  —  pelt 


Example  7.1.6  If  §  €=  Ko,i  (has  the  density  [7r(l  +  v2)]-1)  then  (p%(t)  =  e~^K  The 
reader  will  easily  be  able  to  prove  this  somewhat  later,  using  the  inversion  formula 
and  Example  7.1.5. 


Example  7.1.7  If  §  ^  Uq,i,  then 


(p%(t)  =  f  eltx  dx  ~ 

Jo 


it 


By  virtue  of  Property  3,  the  ch.f.s  of  the  sums  §i  +  §2,  §1  +  §2  +  §3,  •  •  •  that  we 
considered  in  Example  3.6.1  will  be  equal  to 

(elt  —  l)2  ( elt  —  l)3 

^1+^(0  =  “2  ’  ^tl+^2+^3  (0  =  ^3  ’ 

We  return  to  the  general  case.  How  can  one  verify  whether  one  or  another  func¬ 
tion  cp  is  characteristic  or  not?  Sometimes  one  can  do  this  using  the  above  properties. 
We  suggest  the  reader  to  determine  whether  the  functions  (1  +  t)~l,  l+t,  sint,  cos  t 
are  characteristic,  and  if  so,  to  which  distributions  they  correspond. 

In  the  general  case  the  posed  question  is  a  difficult  one.  We  state  without  proof 
one  of  the  known  results. 


Bochner-Khinchin’s  Theorem  A  necessary  and  sufficient  condition  for  a  con¬ 
tinuous  function  cp(t)  with  <^(0)  —  1  to  be  characteristic  is  that  it  is  nonnegatively 
defined,  i.e.,  for  any  real  t\, ...  ,tn  and  complex  X\, ...  ,Xn,  one  has 

n 

^  '  (p(fk  t j  )^k7 j  >  0 

k,j= 1 


(A  is  the  complex  conjugate  of  A). 
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Note  that  the  necessity  of  this  condition  is  almost  obvious,  for  if  cp(t)  =  Eelt % 
then 

2 


n 


n 


TJ  cp(tk  —  tj)XkXj  =E  el(ytk  tj^^k^j  =E 
k,j= 1  k,j= 1 


n 


tkS 


k= 1 


>0. 


7.1.2  The  Properties  ofCh.F.s  Related  to  the  Structure  of  the 
Distribution  of  £ 


8.  If  the  distribution  of §  has  a  density  then  <p%(t)  0  as  \t  \  — >  oo. 

This  is  a  direct  consequence  of  the  Lebesgue  theorem  on  Fourier  transforms.  The 
converse  assertion  is  false. 

In  general,  the  smoother  F(x)  is  the  faster  <p%(t)  vanishes  as  \t\  — >►  oo.  The  for¬ 
mulas  in  Example  7.1.7  are  typical  in  this  respect.  If  the  density  f(x)  has  an  inte¬ 
grate  k- th  derivative  then,  by  integrating  by  parts,  we  get 


eltx  f(x)  dx  = 


J  eltx  f\x)  dx  =  • 


which  implies  that 


j  eitxf(k\x)dx, 


<f>%  (■ t )  < 


c 


\t  I 


k  ’ 


8  A.  If  the  distribution  of  %  has  a  density  of  bounded  variation  then 


|^(0|  <  7-7. 

I  L  I 

This  property  is  also  validated  by  integration  by  parts: 


9.  A  random  variable  §  has  a  lattice  distribution  with  span  h  >  0  (see  Defini¬ 
tion  3.2.3)  if  and  only  if 


(7.1.2) 


if  v  is  not  a  multiple  of  In . 

Clearly,  without  loss  of  generality  we  can  assume  h  —  1 .  Moreover,  since 


V^-a(t) 


e  lta(p^{t)\  =  W{t) 


the  properties  (7.1.2)  are  invariant  with  respect  to  the  shift  by  a .  Thus  we  can  as¬ 
sume  the  shift  a  is  equal  to  zero  and  thus  change  the  lattice  distribution  condition 
in  Property  9  to  the  arithmeticity  condition  (see  Definition  3.2.3).  Since  <p%(t)  is  a 
periodic  function,  Property  9  can  be  rewritten  in  the  following  equivalent  form: 
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The  distribution  of  a  random  variable  §  is  arithmetic  if  and  only  if 

=  |<^r(7)|  <  1  for  all  t  e  (0,  2n).  (7.1.3) 

Proof  If  §  has  an  arithmetic  distribution  then 

n(t)  =  J2^=k)ei,k  =  l 

k 

for  t  =  271.  Now  let  us  prove  the  second  relation  in  (7.1.3).  Assume  the  contrary: 
for  some  v  e  (0,  2tv),  we  have  \<p%(y)  \  =  1  or,  which  is  the  same, 

n(0  =  eibv 

for  some  real  b.  The  last  relation  implies  that 

(p%-b(  v)  =  1  =  Ecos  n(§  —  b)  +  /Esin  —  b),  E[l  —  cos  n(§  —  b)\  =  0. 

Hence,  by  Property  E4  in  Sect.  4.1,  cos  —b)  =  1  and  v  (%  —  b)  =  2i rk(co)  with 

probability  1,  where  k(co)  is  an  integer.  Thus  ^  —  b  is  a  multiple  of  2n  I  v  >  1. 
This  contradicts  the  assumption  that  the  span  of  the  lattice  equals  1,  and  hence 
proves  (7.1.3). 

Conversely,  let  (7.1.3)  hold.  As  we  saw,  the  first  relation  in  (7.1.3)  implies  that 
§  takes  only  integer  values.  If  we  assume  that  the  lattice  span  equals  h  >  1  then, 
by  the  first  part  of  the  proof  and  the  first  relation  in  (7.1.2),  we  get  \(p(2n/h)\  =  1, 
which  contradicts  the  first  relation  in  (7.1.3).  Property  9  is  proved.  □ 

The  next  definition  looks  like  a  tautology. 

Definition  7.1.2  The  distribution  of  §  is  called  non-lattice  if  it  is  not  a  lattice  distri¬ 
bution. 

10.  If  the  distribution  of  ^  is  non-lattice  then 

\(p%(t)  |  <  1  for  all  t  7^  0. 

Proof  Indeed,  if  we  assume  the  contrary,  i.e.  that  \cp(u)\  =  1  for  some  w  /  0,  then, 
by  Property  9,  we  conclude  that  the  distribution  of  §  is  a  lattice  with  span  h  =  2tt/u 
or  with  a  lesser  span.  □ 

11  .If  the  distribution  of  §  has  an  absolutely  continuous  component  of  a  positive 
mass  p  >  0,  then  it  is  clearly  non-lattice  and,  moreover , 

limsup|^(t)  |  <  1  —  p. 

|?  |  — >  oo 

This  assertion  follows  from  Property  8. 

Arithmetic  distributions  occupy  an  important  place  in  the  class  of  lattice  distri¬ 
butions. 

For  arithmetic  distributions,  the  ch.f.  cp^(t)  is  a  function  of  the  variable  z  =  elt 
and  is  periodic  in  t  with  period  2n .  Hence,  in  this  case  it  is  sufficient  to  know  the 


7.2  Inversion  Formulas 


161 


behaviour  of  the  ch.f.  on  the  interval  [— tv,  tv]  or,  which  is  the  same,  to  know  the 
behaviour  of  the  function 

Mz)  :=E z$  =  £z*P(§  =  *) 

on  the  unit  circle  |z|  =  1. 

Definition  7.1.3  The  function  p %  (z)  is  called  the  generating  function  of  the  random 
variable  §  (or  of  the  distribution  of  §). 

Since  p %  (, elt )  =  (p^  ft)  is  a  ch.f.,  all  the  properties  of  ch.f.s  remain  valid  for  gener¬ 
ating  functions,  with  the  only  changes  corresponding  to  the  change  of  variable.  For 
more  on  applications  of  generating  functions,  see  Sect.  7.7. 


7.2  Inversion  Formulas 

Thus  for  any  random  variable  there  exists  a  corresponding  ch.f.  We  will  now  show 
that  the  set  £  of  functions  eltx  is  a  distribution  determining  class,  i.e.  that  the  dis¬ 
tribution  can  be  uniquely  reconstructed  from  its  ch.f.  This  is  proved  using  inversion 
formulas. 


7.2./  The  Inversion  Formula  for  Densities 


Theorem  7.2.1  If  the  ch.f  cp(t)  of  a  random  variable  §  is  integrable  then  the  distri¬ 
bution  off;  has  the  bounded  density 


f{x)  =  j  e  ,tx<p(t)dt. 


(7.2.1) 


This  fact  is  known  from  classical  Fourier  analysis,  but  we  shall  give  a  proof  of  a 
probabilistic  character. 


Proof  First  we  will  establish  the  following  (Parseval’s)  identity:  for  any  fixed  e  >  0, 

ps(t)  :=  —  [  e~ltu(p(u)e~s  11  ^  du 

J 


—  f 

\/7jxe  J 


exp- 


(, u  —  tf 
2s2 


F  (du), 


(7.2.2) 


where  F  is  the  distribution  of  § .  We  begin  with  the  equality 


J-[ 

\f2jt  J 


exp- 


%—t  X‘ 

ix - 


dx  =  exp 


(.S-tj- 

2s2 


(7.2.3) 
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both  sides  of  which  being  the  value  of  the  ch.f.  of  the  normal  distribution  with 
parameters  (0,  1)  at  the  point  (§  —t)/s.  After  changing  the  variable  x  =  su,  the 
left-hand  side  of  this  equality  can  be  rewritten  as 


V2 


Tt 


/ 


exp 


iu( §  —  t)  — 


2  2 


du 


If  we  take  expectations  of  both  sides  of  (7.2.3),  we  obtain 


V2 


TV 


/ 


— tut 


2,  .2 


(p(u)e 


£  U 


du  = 


/ 


exp 


(. u  —  ty 

Is2 


F  (du), 


This  proves  (7.2.2). 

To  prove  the  theorem  first  consider  the  left-hand  side  of  the  equality  (7.2.2).  Since 

2  2  s2u2 

e-s  u  /2  i  as  s  — >  0,  |e  2  |  <  1  and  cp(u)  is  integrable,  as  s  — >  0  one  has 


Pe(0 


1 

27T 


J  e  ltu(p{u)  du  =  po(t) 


(7.2.4) 


uniformly  in  t,  because  the  integral  on  the  left-hand  side  of  (7.2.2)  is  uniformly 
continuous  in  t.  This  implies,  in  particular,  that 


f 

J  a 


ps(t)dt 


f 

J  a 


Po(t ) 


(7.2.5) 


Now  consider  the  right-hand  side  of  (7.2.2).  It  represents  the  density  of  the  sum 
§  +  et),  where  §  and  rj  are  independent  and  rj  ^  <J>0,i.  Therefore 


^(z1)  Jz1  =  P (a  <  §  +  Sr]  <  /?). 


(7.2.6) 


Since  §  +  ase^O  and  the  limit  /  ps(t)dt  exists  for  any  fixed  a  and  b  by 

virtue  of  (7.2.5),  this  limit  (see  (7.2.6))  cannot  be  anything  other  than  F ([a,  b)). 
Thus,  from  (7.2.5)  and  (7.2.6)  we  get 

b 

po(t)dt  =  F([a,  bj). 

This  means  that  the  distribution  F  has  the  density  po(t),  which  is  defined  by  re¬ 
lation  (7.2.4).  The  boundedness  of  po(t)  evidently  follows  from  the  integrability 
of  cp : 


p««)  <  ±  f 


(pit)  dt  <  oo. 


The  theorem  is  proved. 


□ 
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7.2.2  The  Inversion  Formula  for  Distributions 


Theorem  7.2.2  If  F(x)  is  the  distribution  function  of  a  random  variable  §  and  pit) 
is  its  chf. ,  then,  for  any  points  of  continuity  x  and  y  of  the  function  F(x ),2 3 


F(y)  -  F(x)  =  -1-  lim 

Z7T  <7  — >  0 


/ 


ax  _  -ity 


It 


2  2 

pit)e~r  G  dt, 


(7.2.7) 


If  the  function  (pit) /t  is  integrable  at  infinity  then  the  passage  to  the  limit  under  the 
integral  sign  is  justified  and  one  can  write 

i  r  p~itx  —  p~uy 

F 00  -  Fix)  =  —  /  - — - 

2tt  J  it 


cp(t)  dt. 


(7.2.8) 


Proof  Suppose  first  that  the  ch.f.  <p(t)  is  integrable.  Then  F(x)  has  a  density  f(x) 
and  the  assertion  of  the  theorem  in  the  form  (7.2.8)  follows  if  we  integrate  both  sides 
of  Eq.  (7.2.1)  over  the  interval  with  the  end  points  v  and  y  and  change  the  order  of 
integration  (which  is  valid  because  of  the  absolute  convergence). 

Now  let  (pit )  be  the  characteristic  function  of  a  random  variable  §  with  an  ar¬ 
bitrary  distribution  F.  On  a  common  probability  space  with  §,  consider  a  random 
variable  77  which  is  independent  of  §  and  has  the  normal  distribution  with  parame- 

ters  (0,  2a2).  As  we  have  already  pointed  out,  the  ch.f.  of  r\  is  e~l  G  . 

2  2 

This  means  that  the  ch.f.  of  §  +  77,  being  equal  to  pit)e~f~G  ,  is  integrable.  There¬ 
fore  by  (7.2.8)  one  will  have 


F$+n(y)  ~  F^+nix)  = 


e-itx  _  e-ity 

- (p(t)e  dt. 

it 


(7.2.9) 


Since  77  — ->  0  as  a  — >  0,  we  have  =>►  F  (see  Chap.  6).  Therefore,  if  v  and  y  are 
points  of  continuity  of  F,  then  F(y)  —  F(x)  =  linv^o (F^+^(y)  —  F^+r}(x)).  This, 
together  with  (' 7.2.9 ),  proves  the  assertion  of  the  theorem.  □ 


In  the  proof  of  Theorem  7.2.2  we  used  a  method  which  might  be  called  the 
“smoothing”  of  distributions.  It  is  often  employed  to  overcome  technical  difficul¬ 
ties  related  to  the  inversion  formula. 


Corollary  7.2.1  (Uniqueness  Theorem)  The  ch.f.  of  a  random  variable  uniquely 
determines  its  distribution  function. 


2In  the  literature,  the  inversion  formula  is  often  given  in  the  form 

1  fA  e~itx  —  e~ity 

F(y)  —  F  (x)  =  ——  lim  /  - 7 - (p(t)dt 

Z7T  A^oo  J -A  It 

which  is  equivalent  to  (7.2.7). 

3Formula  (7.2.8)  can  also  be  obtained  from  (7.2.1)  without  integration  by  noting  that 
(F(x)  —  F(y))/(y  —  x)  is  the  value  at  zero  of  the  convolution  of  two  densities:  fix)  and  the 
uniform  density  over  the  interval  [— y,  —  x]  (see  also  the  remark  at  the  end  of  Sect.  3.6).  The  ch.f. 

of  the  convolution  is  equal  to  '  (pit)- 
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The  proof  follows  from  the  inversion  formula  and  the  fact  that  F  is  uniquely 
determined  by  the  differences  F(y)  —  F(x). 

For  lattice  random  variables  the  inversion  formula  becomes  simpler.  Let,  for  the 
sake  of  simplicity,  §  be  an  integer- valued  random  variable. 


Theorem  7.2.3  If  p^(z)  :=  E  z^  is  the  generating  function  of  an  arithmetic  random 
variable  then 

P  G  =  k)  =  d-[  p^z)z-k~ldz.  (7.2.10) 

2m  Jlz  |=1 


Proof  Turning  to  the  ch.f.  <p%(t)  =  elt^P(f  =  j )  and  changing  the  variables  z 
it  in  (7.2.10)  we  see  that  the  right-hand  side  of  (7.2.10)  equals 


eit(i~k)dt. 


Here  all  the  integrals  on  the  right-hand  side  are  equal  to  zero,  except  for  the  integral 
with  j  =  k  which  is  equal  to  2tt  .  Thus  the  right-hand  side  itself  equals  P(§  =  k). 
The  theorem  is  proved.  □ 


Formula  (7.2.10)  is  nothing  else  but  the  formula  for  Fourier  coefficients  and  has 
a  simple  geometric  interpretation.  The  functions  {e^  =  eltk }  form  an  orthonormal 
basis  in  the  Hilbert  space  7r,  7r)  of  square  integrable  complex- valued  functions 
with  the  inner  product 

r  mg(t)dt 

2tt 


(g  is  the  complex  conjugate  of  g).  If  (P%  =J2  e&P(§  =  k)  then  it  immediately  follows 
from  the  equality  cp^  =  e^Q ,  e&)  that 

P ^=k)  =  (n,tk)  =  2-  J n  e~i,kn(t)dt. 


7.2.3  The  Inversion  Formula  in  Li.  The  Class  of  Functions  that 
Are  Both  Densities  and  Ch.F.s 

First  consider  some  properties  of  ch.f.s  related  to  the  inversion  formula.  As  a  prelim¬ 
inary,  note  that,  in  classical  Fourier  analysis,  one  also  considers  the  Fourier  trans¬ 
forms  of  functions  /  from  the  space  L2  of  square-integrable  functions.  Since  in  this 
case  a  function  /  is  not  necessarily  integrable,  the  Fourier  transform  is  defined  as 
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the  integral  in  the  principal  value  sense: 


(pit)  :=  lim  (p(N)it), 

N^O O 


/N 

-N 


eitx  f(x)dx,  (7.2.11) 


where  the  limit  is  taken  in  the  sense  of  convergence  in  L2: 

2 

(pit)  —  (p(N)(t)  dx  — >►  0  as  N  — > 
Since  by  Parseval’s  equality 


/ 


00. 


1 


II/IIl2  =  — II^IIl2,  where  \\g\\Ll  = 

Z7T 


I 


H  1/2 


\g\  (t)dt 


the  Fourier  transform  maps  the  space  L2  into  itself  (there  is  no  such  isometricity 
for  Fourier  transforms  in  L\).  Here  the  inversion  formula  (7.2.1)  holds  true  but  the 
integral  in  (7.2.1)  is  understood  in  the  principal  value  sense. 

Denote  by  T  and  Ji  the  class  of  all  densities  and  the  class  of  all  ch.f.s,  respec¬ 
tively,  and  by  !Ki?+  C  L\  the  class  of  nonnegative  real-valued  integrable  ch.f.s, 
so  that  the  elements  of  3fp+  are  in  T  up  to  the  normalising  factors.  Further,  let 
(!Ki?+)(_1)  be  the  inverse  image  of  the  class  !Kp+  in  T  for  the  mapping  /  — >  (p, 
i.e.  the  class  of  densities  whose  ch.f.s  lie  in  !Ki?+.  It  is  clear  that  functions  / 
from  (!Ki.  +)(  ^  and  cp  from  are  necessarily  symmetric  (see  Property  7 A  in 
Sect.  7.1)  and  that  /( 0)  e  (0,  00).  The  last  relation  follows  from  the  fact  that,  by  the 
inversion  formula  for  cp  e  !Kii+,  we  have 


-/ 


=  IMIl,  =  /  <p(t)dt  =  2nf  (0). 


Further,  denote  by  (!Ki?+)||.||  the  class  of  normalised  functions  <p  G  Tfi?+,  so 


that  (!Ki5+)||.||  C  T,  and  denote  by  3^2’*)  the  class  of  convolutions  of  symmetric 
densities  from  L2: 


ff(2,*)  :=  :  /  £  L2,  f  is  symmetric] , 


where 


/ 


f(2)*(X)=  I  f  (t)f  (x  —  t)  dt. 


Theorem  7.2.4  The  following  relations  hold  true : 

(JCi,+)(-1)  =  (JCi>+)||.||,  C  (5Cii+)||.||. 

The  class  (!Kp+)||.||  may  be  called  the  class  of  densities  conjugate  to  f  e 
(!Ki?+)^_1\  It  turns  out  that  this  class  coincides  with  the  inverse  image  (!Ki?+)^_1\ 
The  second  statement  of  the  theorem  shows  that  this  inverse  image  is  a  very  rich 


4Here  we  again  omit  the  factor  —k=  (cf.  the  footnote  on  page  154). 
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class  and  provides  sufficient  conditions  for  the  density  /  to  have  a  conjugate.  We 
will  need  these  conditions  in  Sect.  8.7. 


Proof  of  Theorem  7.2.4  Let  /  G  1}.  Then  the  corresponding  ch.f.  <p  is  in 

Tfi+  and  the  inversion  formula  (7.2.1)  is  applicable.  Multiplying  its  right-hand  side 


r\ 

by  -ffr,  we  obtain  an  expression  for  the  ch.f.  (at  the  point  —t)  of  the  density 


Ml 


(recall  that  cp  >  0  is  symmetric  if  cp  g  CKi?+).  This  means  that  is  a  ch.f.  and, 
moreover,  that  /  g  (CKi?+)||.|| . 

Conversely,  suppose  that  /*  :=  g  (Tfi?+)||.|| .  Then  /*  g  Tis  symmetric,  and 
the  inversion  formula  can  be  applied  to  cp : 


fix)  = 


=  -/ 
2n  J 


e  ltxcp(t)dt  = 


=  -p 

lit  J 


tx 


<p(t)dt. 


2jtf  (t) 

Ml 


-f' 


tX  r* 


/  (x)dx. 


Since  the  ch.f.  <p*(t)  :=  2^|j^  belongs  to  Tfp+,  one  has  /*  e  (!Ki?+)^_1). 

We  now  prove  the  second  assertion.  Suppose  that  f  e  L2.  Then  <p  e  L2  and 
cp1  G  L\.  Moreover,  by  virtue  of  the  symmetry  of  /  and  Property  7 A  in  Sect.  7.1, 
the  function  cp  is  real- valued,  so  cp2  >  0.  This  implies  that  cp1  e  CKi,+.  Since  cp2  is 
the  ch.f.  of  the  density  *,  we  have  /^*  e  (Tfp  + )  ^  1  ^ .  The  theorem  is  proved.  □ 


Note  that  any  bounded  density  f  belongs  to  L2.  Indeed,  since  the  Lebesgue  mea¬ 
sure  of  {x  :  f{x)  >  1}  is  always  less  than  1,  for  /(•)  <  N  we  have 


|2  =  [  f2(x)dx  <  f  f(x)dx  +  N2  [ 

J  J  f{x)<\  Jf(x)>  1 


dx  <  1  +  N2 , 


□ 


Thus  we  have  obtained  the  following  result. 


Corollary  7.2.2  For  any  bounded  symmetric  density  /,  the  convolution  /C)*  is ? 
to  a  constant  factor,  the  ch.f.  of  a  random  variable. 


Example  7.2.1  The  “triangle”  density 


1 

H 

if  \x\ 

<  1, 

0 

if  \x\ 

>  1, 

being  the  convolution  of  the  two  uniform  distributions  on  [—1/2,  1/2]  (cf.  Exam¬ 
ple  3.6.1)  is  also  a  ch.f.  We  suggest  the  reader  to  verify  that  the  preimage  of  this 
ch.f.  is  the  density 


fix)  = 


1  sin2  v /2 
2tt  x2 


(the  density  conjugate  to  g).  Conversely,  the  density  g  is  conjugate  to  /,  and  the 
functions  871  fit)  and  g(t)  will  be  ch.f.s  for  g  and  /,  respectively. 

These  assertions  will  be  useful  in  Sect.  8.7. 


7.3  The  Continuity  (Convergence)  Theorem 


167 


7.3  The  Continuity  (Convergence)  Theorem 

Let  {<pn(t)}%Li  be  a  sequence  of  ch.f.s  and  {Fn}ff=l  the  sequence  of  the  respective 
distribution  functions.  Recall  that  the  symbol  =>►  denotes  the  weak  convergence  of 
distributions  introduced  in  Chap.  6. 

Theorem  7.3.1  (The  Continuity  Theorem)  A  necessary  and  sufficient  condition  for 
the  convergence  Fn  =>►  F  as  n  —>  oo  is  that  cpn(t)  —>  (pit)  for  any  t,  (p(t)  being  the 
ch.f  corresponding  to  F . 


The  theorem  follows  in  an  obvious  way  from  Corollary  6.3.2  (here  two  of  the 
three  sufficient  conditions  from  Corollary  6.3.2  are  satisfied:  conditions  (2)  and  (3)). 
The  proof  of  the  theorem  can  be  obtained  in  a  simpler  way  as  well.  This  way  is 
presented  in  Sect.  7.4  of  the  previous  editions  of  this  book. 

In  Sect.  7.1,  for  nonnegative  random  variables  §  we  introduced  the  notion  of 
the  Laplace  transform  f(s)  :=  Ee~s^ .  Let  fn (s)  and  f(s)  be  Laplace  transforms 
corresponding  to  Fn  and  F .  The  following  analogue  of  Theorem  7.3.1  holds  for 
Laplace  transforms: 

In  order  that  Fn  =>►  F  as  n  ->  oo  it  is  necessary  and  sufficient  that  fn  OO  — >  f  is) 
for  each  s  >  0. 

Just  as  in  Theorem  7.3.1,  this  assertion  follows  from  Corollary  6.3.2,  since  the 
class  {f(x)  =  e~sx ,  s  >  0}  is  (like  {eltx})  a  distribution  determining  class  (see  Prop¬ 
erty  6  in  Sect.  7.1)  and,  moreover,  the  sufficient  conditions  (2)  and  (3)  of  Corol¬ 
lary  6.3.2  are  satisfied. 

Theorem  7.3.1  has  a  deficiency:  one  needs  to  know  in  advance  that  the  func¬ 
tion  (pit)  to  which  the  ch.f.s  converge  is  a  ch.f.  itself.  However,  one  could  have  no 
such  prior  information  (see  e.g.  Sect.  8.8).  In  this  connection  there  arises  a  natural 
question  under  what  conditions  the  limiting  function  (pit)  will  be  characteristic. 

The  answer  to  this  question  is  given  by  the  following  theorem. 


Theorem  7.3.2  Let 


<Pnit)  = 


dFnix) 


be  a  sequence  of  ch.f  s  and  <pn(t)  — >  (pit)  as  n  ->  oo  for  any  t. 
Then  the  following  three  conditions  are  equivalent : 


(a)  (pit)  is  a  ch.f ; 

(b)  (pit)  is  continuous  at  t  =  0; 

(c)  the  sequence  {Fn}  is  tight. 


Thus  if  we  establish  that  (pn{t)  — >  (pit)  and  one  of  the  above  three  conditions  is 
met,  then  we  can  assert  that  there  exists  a  distribution  F  such  that  (p  is  the  ch.f.  of 
F  and  Fn  =>  F . 
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Proof  The  equivalence  of  conditions  (a)  and  (c)  follows  from  Theorem  6.3.2.  That 
(a)  implies  (b)  is  known.  It  remains  to  establish  that  (c)  follows  from  (b).  First  we 
will  show  that  the  following  lemma  is  true.  □ 


Lemma  7.3.1  If  cp  is  the  chf  of  §  then,  for  any  u  >  0, 


P  l£l>-  <- 


U 


1  Cu 

-  /  [1-^(0] 

u  J —u 


dt, 


Proof  The  right-hand  side  of  this  inequality  is  equal  to 

u  roo 


^  ru  roo 

—  I  I  (l  —  e~ltx)  dF(x)  dt, 
uJ-uJ-oc 


where  F  is  the  distribution  function  of  §.  Changing  the  order  of  integration  and 
noting  that 


j“  (1  -  e~itx)  dt  =  (t  + 


,—itx 


IX 


u 


—  2u\  l  — 


-u 


sin  ux 


ux 


we  obtain  that 


/. 


sin  ux  . 

1 - I  dF(x) 


ux 


>2  j  11- 

\x\>2  ju 


sm  ux 


ux 


^  dF(x) 


L 


1 


>2f  U-, 

\x\>2 ju  '  'MX 


dF(x)  >  /  dF(x). 

J \x\>2  ju 


The  lemma  is  proved. 


□ 


Now  suppose  that  condition  (b)  is  met.  By  Lemma  7.3.1 

f  l  ru  i  ru 

limsup  /  dFn(x)  <  limsup  -  /  [l  —  (pn{t)\  dt  =  -  /  [l  —  cp(t)]dt. 

n^o o  J\x\>2/u  n^o o  U  J —u  U 

Since  (p(t)  is  continuous  at  0  and  ^(0)  =  1,  the  mean  value  on  the  right-hand  side  can 
clearly  be  made  arbitrarily  small  by  choosing  sufficiently  small  u.  This  obviously 
means  that  condition  (c)  is  satisfied.  The  theorem  is  proved.  □ 


Using  ch.f.s  one  can  not  only  establish  convergence  of  distribution  functions  but 
also  estimate  the  rate  of  this  convergence  in  the  cases  when  one  can  estimate  how 
fast  (pn  —  cp  vanishes.  We  will  encounter  respective  examples  in  Sect.  7.5. 

We  will  mostly  use  the  machinery  of  ch.f.s  in  Chaps.  8,  12  and  17.  In  the  present 
chapter  we  will  also  touch  upon  some  applications  of  ch.f.s,  but  they  will  only  serve 
as  illustrations. 


7.4  The  Application  of  Characteristic  Functions 
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7.4  The  Application  of  Characteristic  Functions  in  the  Proof 
of  the  Poisson  Theorem 

Let  , . . . ,  be  independent  integer- valued  random  variables, 

k 

sn  =  7>,  P(&  =  1)  =  Pk,  P(&  =  0)  =  1  -  Pk  -  qk • 

l 

The  theorem  below  is  a  generalisation  of  the  theorems  established  in  Sect.  5.4. 

Theorem  7.4.1  One  has 

n  n  n 

P(S„  =k)  -  IIM({&})|  <'Y^pl  +  2'Y^qk,  where  n  =  '^2lpk. 

k= 1  k= 1  i=l 


Thus,  if  one  is  given  a  triangle  array  §iw,  £2»,  •••>£««>  h  =  1, 2, . . . ,  of  indepen¬ 
dent  integer- valued  random  variables, 


n 

Sn  —  ^  '  £kft  ?  P(£&ft  —  1)  —  Pkn,  P(£fcft  —  0)  =  1  Pkn  Qkn-> 
k=  1 

n 

P'  —  ^  '  Pkn  ■> 
k=  1 


then  a  sufficient  condition  for  convergence  of  the  difference  P(5n  =  k)  —  Il^dC}) 
to  zero  is  that 

ft  ft 

k= 1  fc=l 

Since 

ft 

EpL  <  M  max  pkn , 

*  £<ft 

fc=l 

the  last  condition  is  always  met  if 

max  pkn  —>  0,  [i  <  /j-o  =  const. 

k<n 


5 This  extension  is  not  really  substantial  since  close  results  could  be  established  using  Theo¬ 
rem  5.2.2  in  which  ^  can  only  take  the  values  0  and  1.  It  suffices  to  observe  that  the  probability  of 
the  event  A  =  7^  0,  &  ^  1}  is  bounded  by  the  sum  and  therefore 

P(S„  =  k)  =  6X  E<7*  +  (l  -  02  E<?*)P(5»  =  £|V>.  0i  <!.«'  =  1,  2, 
where  P(Sn  =  k\  A)  =  P (S*  =  k )  and  S*  are  sums  of  independent  random  variables  ^  with 

rfe*  =  i  )  =  pi  =  -B-,  Pfe*  =  o)  =  i  -Pi 

1  Hk 
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To  prove  the  theorem  we  will  need  two  auxiliary  assertions. 


Lemma  7.4.1  If  Re  0  <0  then 


ep-l  <|j8|, 


-\-p  <  |/3 12/2,  eP  -l- p-  02/2  <  |0|3/6 


Proof  The  first  two  inequalities  follow  from  the  relations  (we  use  here  the  change 
of  variables  t  =  fiv  and  the  fact  that  \es  |  <  1  for  Res  <  0) 


rP 

f 1  o 

H 

1 

ca 

— 

1  el  dt 

— 

0  /  e^v  dv 

Jo 

Jo 

<101, 


eP  -  1  - 


rP  r  1  /*1 

/  (e‘-l)dt  =  P  (e^v  —  \)dv  <  |/J|2  / 

Jo  Jo  Jo 


vdv  = 


P2  A 


The  last  inequality  is  proved  in  the  same  way. 


□ 


Lemma  7.4.2  If  \a^\  <  1,  |^|  <  1,  =  1, . . .  ,n,  then 


n 


n 


]~[  ^  h 


k= i 


£=i 


/t=i 


Thus  if  cpkit)  and  Skit)  are  chfs  then,  for  any  t , 


ft 


£=1 


fc=l 


ft 


t=i 


froo/  Put  A„  =  nCi  «k  and  =  flA  h-  Then  \An\  <  1,  \B„\  <  1,  and 
l^-ft  ^ft  I  =  l^lft  — l^ft  Bn  —  l^ftl 

—  | (Aft-i  Bn—\)an  +  (an  hyi)Bn—\  |  <  |  An—\  Bn— 1 1  +  | an  hn  \ . 

Applying  this  inequality  n  times,  we  obtain  the  required  relation.  □ 


Proof  of  Theorem  7.4. 1  One  has 

<Pk(t)  :=Ee"?t  =  1  +  pk(elt  -  l)  +  qk(vk(t)  ~  l), 

where  yk(t)  is  the  ch.f.  of  some  integer- valued  random  variable.  By  independence 
of  the  random  variables 


Let  further  £  II  ^  Then 


ft 

<psn(t)  =  \\<Pk(t). 

k=  1 


«0?(O  =  E  S'*  =erte“-1)  =  Y\dk(t), 

k=  1 
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where  Skit)  =  ePk^  .  Therefore  the  difference  between  the  ch.f.s  <psn  and  cp^  can 
be  bounded  by  Lemma  7.4.2  as  follows: 


n 


n 


n**-n* 


n 


<  I <Pk-  0k\, 


k=  1 


(pSn  (t)  ~  (pt;  (0 1  = 

*=1  *=1 

where  by  Lemma  7.4.1  (note  that  Re(elt  —  1)  <  0) 

9k(t)  -  1  -  Pk(elt  -  l)|  <  Pk^  2 — —  =  ^(sm2t  +  (1  -  cost)2) 

(  sin  2t 


=  Pk[ 


t 


+  2  sin  -  , 

v  2  2 


(7.4.1) 


n 


n 


n 


^\<Pk-0k\  <2j2qk  +  J2Pk(  ~2 

k=  1  k= 1  k=  1  ' 

It  remains  to  make  use  of  the  inversion  formula  (7.2.10) 


(  sin2 1  a  t 

+  2  sin4  - 


P(s„  =  fc)-nM  ({fc})|< 


2“  /  e  lkt(vsn(t)  -  (ps(t))dt 


n  JO 


2±qk  +  ±pl(S-^+2s^L 


n 


L  k=  1 

n 


k=l 


dt 


=  2  J2vk  +  Y.Pk’ 


for 


1  r 

—  /  s* 
Jo 

The  theorem  is  proved 


k= l 


z  1 

.  sin  t  dt  =  -, 
2tt  J o  4 


£=1 


r  .  4t 

/  sin  - 

Jo  2 


2  /,7r 
^  Jo 


dt  =  - 
2  4 


□ 


If  one  makes  use  of  the  inequality  \elt  —  1|  <  2  in  (7.4.1),  the  computations  will 
be  simplified,  there  will  be  no  need  to  calculate  the  last  two  integrals,  but  the  bounds 
will  be  somewhat  worse: 

-0k\  <2(j2(ik+YJPk)’ 

P(S„  =  k)~  n4{&})|  <2 (£>*  +  y>f). 


7.5  Characteristic  Functions  of  Multivariate  Distributions. 

The  Multivariate  Normal  Distribution 

Definition  7.5.1  Given  a  random  vector  §  =  (§i,  §2,  •  •  • ,  &),  its  ch.f.  (the  ch.f.  of 
its  distribution)  is  defined  as  the  function  of  the  vector  variable  t  =  (t\ , . . . ,  td)  equal 
to 
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<pdt)  :=  Ee“^T  =  Eei(t^  =  Eexp 


d 


i  ^  '  tkfk 


k=l 


/ 


=  i  exp 


d 


i  ^2  tkXk 


„  k=  1 


E^ (dx i , . . . ,  dX(f) , 


where  §r  is  the  transpose  of  §  (a  column  vector),  and  (t,  $)  is  the  inner  product. 


The  ch.f.s  of  multivariate  distributions  possess  all  the  properties  (with  obvious 
amendments  of  their  statements)  listed  in  Sects.  7. 1-7.3. 

It  is  clear  that  <^(0)  =  1  and  that  \<p%(t)\  <  1  and  <p%(—t)  =  <p%(t)  always  hold. 

Further,  <p^  (t)  is  everywhere  continuous.  If  there  exists  a  mixed  moment  E^1  •  •  •  ^ d 
then  p |  has  the  respective  derivative  of  order  k\  H - 1-  kd'. 


dqp+"’+kd  (t) 

dtkl  . . .  3 tk/ 


t= 0 


‘k\-\ - t^d 

1  • 


If  all  the  moments  of  some  order  exist,  then  an  expansion  of  the  function  (p%(t) 
similar  to  (7.1.1)  is  valid  in  a  neighbourhood  of  the  point  t  =  0. 

If  <p%(t)  is  known,  then  the  ch.f.  of  any  subcollection  of  the  random  variables 
(&J , . . . ,  & . )  can  obviously  be  obtained  by  setting  all  4  except  , . . . ,  4 .  to  be 
equal  to  0. 

The  following  theorems  are  simple  extensions  of  their  univariate  analogues. 


Theorem  7.5.1  (The  Inversion  Formula)  If  A  is  a  parallelepiped  defined  by  the 
inequalities  a^  <  x  <  Z?£,  k  =  1, . . . ,  d,  and  the  probability  P(§  6  A)  is  continuous 
on  the  faces  of  the  parallelepiped ,  then 


P(§  e  A)  =  lim 


<7- 


o  (2n) 


kd 


itk&k  _  itkbk 


itk 


-do1 

e 


<P%(t)dt\  •  "dtd. 


If  the  random  vector  §  has  a  density  f(x)  and  its  ch.f.  cp%(t)  is  integrable,  then 
the  inversion  formula  can  be  written  in  the  form 

f(x)=  (27F  /  e~i(t'x)nd)dt. 

If  a  function  g(v)  is  such  that  its  Fourier  transform 

g(t)  =  J  el^t,x^g(x)  dx 

is  integrable  (and  this  is  always  the  case  for  sufficiently  smooth  g(v))  then  the  Par- 
seval  equality  holds: 

Eg(?)=EW  /  e~i{t’^(t)dt=^d  /  vd-modt. 


7.5  Characteristic  Functions  of  Multivariate  Distributions 


173 


As  before,  the  inversion  formula  implies  the  theorem  on  one-to-one  correspon¬ 
dence  between  ch.f.s  and  distribution  functions  and  together  with  it  the  fact  that 
{£*(*,*)}  is  a  distribution  determining  class  (cf.  Definition  6.3.2). 

The  weak  convergence  of  distributions  F n(B)  in  the  rZ-dimensional  space  to  a 
distribution  F (B)  is  defined  in  the  same  way  as  in  the  univariate  case:  F (n)  =t>'  F  if 


j  f(x) d¥(n)(dx)  j 


f(x)d¥(dx) 


for  any  continuous  and  bounded  function  fix). 

Denote  by  cpn{t)  and  cp(t)  the  ch.f.s  of  distributions  ¥n  and  F,  respectively. 


Theorem  7.5.2  (Continuity  Theorem)  A  necessary  and  sufficient  condition  for  the 
weak  convergence  F(w)  =>►  F  is  that ,  for  any  t ,  <pn(t)  —>  (pit)  as  n  —>  oo. 

In  the  case  where  one  can  establish  convergence  of  <pn(t)  to  some  function  (pit), 
there  arises  the  question  of  whether  (pit)  will  be  the  ch.f.  of  some  distribution,  or, 
which  is  the  same,  whether  the  sequence  F (n)  will  converge  weakly  to  some  distri¬ 
bution  F.  Answers  to  these  questions  are  given  by  the  following  assertion.  Let  An 
be  the  cube  defined  by  the  inequality  max^  \xr\  <  N. 


Theorem  7.5.3  (Continuity  Theorem)  Suppose  a  sequence  (pn(t)  of  ch.f  s  converges 
as  n  — >  oo  to  a  function  (pit)  for  each  t.  Then  the  following  three  conditions  are 
equivalent : 

(a)  (pit)  is  a  ch.f ; 

(b)  (pit)  is  continuous  at  the  point  t  =  0; 

(c)  limsup n^ooL^AN  F0 i)(dx )  0  as  N  -+  00. 

All  three  theorems  from  this  section  can  be  proved  in  the  same  way  as  in  the 
univariate  case. 


Example  7.5.1  The  multivariate  normal  distribution  is  defined  as  a  distribution  with 
density  (see  Sect.  3.3) 


/?(*) 


|  A  | 1/2 
(2 


Q(x)/2 


where 


d 

Q(x)  =  xAx1  =  Yj  ajjXjXj , 

i,j= 1 


and  |  A  |  is  the  determinant  of  a  positive  definite  matrix  A  =  \\aij  ||. 

This  is  a  centred  normal  distribution  for  which  =  0.  The  distribution  of  the 
vector  §  +  a  for  any  constant  vector  a  is  also  called  normal. 

Find  the  ch.f.  of  §.  Show  that 


n(f)  =  exp< 


2 


(7.5.1) 
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where  a~  =  A  1  is  the  matrix  inverse  to  A  and  coinciding  with  the  covariance 
matrix  ||cr^  ||  of  §: 


Indeed, 


^4T 

(2  n)dl2 


exp 


xAx 


dx i • •  • dxd 


(7.5.2) 


Choose  an  orthogonal  matrix  C  such  that  CAC 1  =  D  is  a  diagonal  matrix,  and 
denote  by  /xi, . . . ,  i±n  the  values  of  its  diagonal  elements.  Change  the  variables  by 
putting  x  =  yC  and  t  =  vC.  Then 


d 


|A|  =  \D\  =  Y\dk, 


k= l 


d 


itxT  -  ^xAxt  =ivyT  -  ]^yDyT  =i^vkyk  - 

k= 1  Z  k= 1 

and,  by  Property  2  of  ch.f.s  of  the  univariate  normal  distributions, 


= 


(2n)d/2 


d  r 

nf 

k=lJ 


oo 


exp 


oo 


ivkyk 


wt y\ 


d 


■dyu  =vWfl  =  exp 

L\ 


2/d-k 


=  exp 


vD  1  vT 

l  —  pvn  . 

tCT  D~lCtT 

—  pvn  . 

tA~ltT  ] 

2  J 

>  —  exp 

<N 

—  exp  * 

(N 

On  the  other  hand,  since  all  the  moments  of  §  exist,  in  a  neighbourhood  of  the  point 
t  =  0  one  has 

=  i  -  \tA~ltT +°(yyi)  =  i +  l/oV7" 


From  this  it  follows  that  E§  =  0,  A  1  =  a 


Formula  (7.5.1)  that  we  have  just  proved  implies  the  following  property  of  nor¬ 
mal  distributions:  the  components  of  the  vector  (§i, . . . ,  §</)  are  independent  if  and 
only  if  the  correlation  coefficients  p  (ft ,  ^j)  are  zero  for  all  i  /  j .  Indeed,  if  <r2  is  a 
diagonal  matrix,  then  A  =  cr~2  is  also  diagonal  and  /■&  (v)  is  equal  to  the  product  of 
densities.  Conversely,  if  (§i  ,...,§</)  are  independent,  then  A  is  a  diagonal  matrix, 
and  hence  a2  is  also  diagonal. 


7.6  Other  Applications  of  Characteristic  Functions 
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7.6  Other  Applications  of  Characteristic  Functions. 
The  Properties  of  the  Gamma  Distribution 

7.6.1  Stability  of  the  Distributions  <J>aCT.2  and 


The  stability  property  means,  roughly  speaking,  that  the  distribution  type  is  pre¬ 
served  under  summation  of  random  variables  (this  description  of  stability  is  not 
exact,  for  more  detail  see  Sect.  8.8). 

The  sum  of  independent  normally  distributed  random  variables  is  also  normally 
distributed.  Indeed,  let  and  §2  be  independent  and  normally  distributed  with  pa¬ 
rameters  (a\,  erf)  and  (<22,  cr|),  respectively.  Then  the  ch.f.  of  +  §2  is  equal  to 


^i+&(0  =  (0^2  (0  =  exP 


=  exp 


ita\  — 


t2crf 


exp- 


itci2  — 


t1^ 


it(a\  +  a2)  -  —  (of  +  cr22) 


Thus  the  sum  §1  +  §2  is  again  a  normal  random  variable,  with  parameters  (a\  + 
<22,  erf  -her2). 

Normality  is  also  preserved  when  taking  sums  of  dependent  random  variables 
(components  of  an  arbitrary  normally  distributed  random  vector).  This  immediately 
follows  from  the  form  of  the  ch.f.  of  the  multivariate  normal  law  found  in  Sect.  7.5. 
One  just  has  to  note  that  to  get  the  ch.f.  of  the  sum  §1  +  •••+§«  it  suffices  to  put 
t\  =  •  •  •  =  tn  =  t  in  the  expression 


<%i,...,£n)0i’  =Eexp{ifi£i  H - H **£*}. 

The  sum  of  independent  random  variables  distributed  according  to  the  Poisson 
law  also  has  a  Poisson  distribution.  Indeed,  consider  two  independent  random  vari¬ 
ables  ^  UXl  and  §2  ^  nl2.  The  ch.f.  of  their  sum  is  equal  to 

<^1+fo(0  =  exp{A.i(e!*  -  l)}exp{A2(e,r  -  l)}  =  exp{(Ai  +X2)(e,t  -  l)}. 
Therefore  +  $2  ^  n>_l+A2. 

The  sum  of  independent  random  variables  distributed  according  to  the  Cauchy 
law  also  has  a  Cauchy  distribution.  Indeed,  if  §1  ^  Kai?(Tl  and  §2  Ka2?(T2,  then 

cp^l+^2{t)  =  exp{ia\t  —  er\\t\}  expj/c^f  —  <72 |} 

=  exp{/(oq  +ct2)t  -  (cf\  +cr2)\t\}; 

§1  +  §2  ^  I^ai+Q'2,0'l+or2' 

The  above  assertions  are  closely  related  to  the  fact  that  the  normal  and  Poisson 
laws  are,  as  we  saw,  limiting  laws  for  sums  of  independent  random  variables  (the 
Cauchy  distribution  has  the  same  property,  see  Sect.  8.8).  Indeed,  if  S2n/V^n  con¬ 
verges  in  distribution  to  a  normal  law  (where  Sk  =  Yf!j=i%j’  §7  are  independent 
and  identically  distributed)  then  it  is  clear  that  Sn/^/n  and  (S2n  ~  Sn)/y/n  will  also 
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converge  to  a  normal  law  so  that  the  sum  of  two  asymptotically  normal  random 
variables  also  has  to  be  asymptotically  normal. 

Note,  however,  that  due  to  its  arithmetic  structure  the  random  variable  §  ^  11^ 
(as  opposed  to  §  €=  &aa2  or  §  €=  Ka?a)  cannot  be  transformed  by  any  normalisation 
(linear  transformation)  into  a  random  variable  again  having  the  Poisson  distribution 
but  with  another  parameter.  For  this  reason  the  Poisson  distribution  cannot  be  stable 
in  the  sense  of  Definition  8.8.2. 

It  is  not  hard  to  see  that  the  other  distributions  we  have  met  do  not  possess  the 
above-mentioned  property  of  preservation  of  the  distribution  type  under  summa¬ 
tion  of  random  variables.  If,  for  instance,  and  §2  are  uniformly  distributed  over 
[0,  1]  and  independent  then  and  F^1+|2  are  substantially  different  functions  (see 
Example  3.6.1). 


7.6.2  The  T -distribution  and  its  properties 


In  this  subsection  we  will  consider  one  more  rather  wide-spread  type  of  distribution 
closely  related  to  the  normal  distribution  and  frequently  used  in  applications.  This 
is  the  so-called  Pearson  gamma  distribution  Ta^.  We  will  write  §  €=  Ta,x  if  §  has 
density 


f(x;a,  X) 


4 


ctk  A.— 1  p—otx 

r(X)A  c 


x  >  0, 

v  <  0, 


depending  on  two  parameters  a  >  0  and  A  >  0,  where  r(X)  is  the  gamma  function 


r(X)  = 


x  >  0. 


It  follows  from  this  equality  that  /  f(x\  a,  X)  dx  =  1  (one  needs  to  make  the  variable 
change  ax  =  y).  If  one  differentiates  the  ch.f. 


cp(t)  =  cp(t;a,X)  = 


a 


x 


r(X) 


L 


00 


xX-leitx-axdj( 


with  respect  to  t  and  then  integrates  by  parts,  the  result  will  be 


a 1  ,  itr-rvx  ^  r  A-1  itx-rvr 

-  /  ixxe  axdx= - /  1e  t  dx 

r(X)J  0  r(x)a-itj  0 

iX 

— -<p(  0; 

a  —  it 

(hupit))'  =  (— Aln(o'  —  it))' ,  cp{t)  =  c(a  —  it)~x . 


Since  cp(0)  =  1  one  has  c  =  ax  and  cp(t)  =  (1  —  it /a)~x . 

It  follows  from  the  form  of  the  ch.f.  that  the  subfamily  of  distributions  Ya ^  for 
a  fixed  a  also  has  a  certain  stability  property:  if  ^  ^  ^  and  §2  €=  Ya,x2  are 

independent,  then  §1  +  §2  ^ 
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An  example  of  a  particular  gamma  distribution  is  given,  for  instance,  by  the  dis¬ 
tribution  of  the  random  variable 


where  £/  are  independent  and  normally  distributed  with  parameters  (0,  1).  This  is  the 
so-called  chi-squared  distribution  with  n  degrees  of  freedom  playing  an  important 
role  in  statistics. 

To  find  the  distribution  of  XX  h  suffices  to  note  that,  by  virtue  of  the  equality 


p(xi2<^)  =  p(I?iI<V^)  =  -^=  l 

\I2tx  Jo 


-u2/2du, 


the  density  of  x2  is  equal  to 


-4=  e~x/2x~l/2  =  f(x ;  1/2, 1/2),  xf  €=  r1/2,i/2. 

\/2lX 


This  means  that  the  ch.f.  of  Xn 


<pn(t\  1/2,  1/2)  =  (1  —  2 it)~n/1  =  (p{t\  l/2,n/2) 


and  corresponds  to  the  density  f(t;  1/2,  n/2). 

Another  special  case  of  the  gamma  distribution  is  the  exponential  distribution 
ra  =  Ta 5 1  with  density 

f(x\ a,  1)  =  ae~ax ,  x  >  0, 


and  characteristic  function 


(  it 

<p(x\ a,  1)  =  1 - 

\  a 

We  leave  it  to  the  reader  to  verify  with  the  help  of  ch.f.s  that  if 
independent,  a  j  ai  for  j  /,  then 

(n  \  n  n  /  \  ~  1 

/  =  1  /  7  =  1  /= 1  ' 

/#i 


Ta,  and  are 


«/ 

o'/ 


In  various  applications  (in  particular,  in  queueing  theory,  cf.  Sect.  12.4),  the  so- 
called  Erlang  distribution  is  also  of  importance.  This  is  a  distribution  with  density 
f(x  \  a,  X)  for  integer  A.  The  Erlang  distribution  is  clearly  a  A -fold  convolution  of 
the  exponential  distribution  with  itself. 

We  find  the  expectation  and  variance  of  a  random  variable  §  that  has  the  gamma 
distribution  with  parameters  (a,  X): 


.  X 

E§  =  —icp  ( 0;  a,  X)  =  — , 

a 


=  —i(p'\ 0;  a ,  X)  = 
2 


Var(§)  = 


A(A  +  1) 


A  (A  +  1) 


o'¬ 


er 


A 

a 


X 


a 


2  ' 
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Distributions  from  the  gamma  family,  and  especially  the  exponential  ones,  are 
often  (and  justifiably)  used  to  approximate  distributions  in  various  applied  problems. 
We  will  present  three  relevant  examples. 

Example  7.6.1  Consider  a  complex  device.  The  failure  of  at  least  one  of  n  parts 
comprising  the  device  means  the  breakdown  of  the  whole  device.  The  lifetime  dis¬ 
tribution  of  any  of  the  parts  is  usually  well  described  by  the  exponential  law.  (The 
reasons  for  this  could  be  understood  with  the  help  of  the  Poisson  theorem  on  rare 
events.  See  also  Example  2.4.1  and  Chap.  19.) 

Thus  if  the  lifetimes  %j  of  the  parts  are  independent,  and  for  the  part  number  j 
one  has 


P >x)  =  e~aJx9  x  >0, 

then  the  lifetime  of  the  whole  device  will  be  equal  to  rjn  =  min(§i ,  ...,§„)  and  we 
will  get 


n 


n 


p (tin  >  x)  =  P(  C| {$_,■  >  x}  j  =  Y\  P($J  >  x)  =  exp 

7=1  /  7=1 


n 


-x  2J  01  i 


i= 1 


This  means  that  r)n  will  also  have  the  exponential  distribution,  and  since 


j  —  1  /otj , 


the  mean  failure-free  operation  time  of  the  device  will  be  equal  to 


n 


Er)n  =  E 


i  =  1 


1 

Wi 


Example  7.6.2  Now  turn  to  the  distribution  of  t,n  =  max(§i, . . . ,  §w),  where  §z  are 
independent  and  all  have  the  T -distribution  with  parameters  (of,  A).  We  could  con¬ 
sider,  for  instance,  a  queueing  system  with  n  channels.  (That  could  be,  say,  a  mul¬ 
tiprocessor  computer  solving  a  problem  using  the  complete  enumeration  algorithm, 
each  of  the  processors  of  the  machine  checking  a  separate  variant.)  Channel  number 
i  is  busy  for  a  random  time  §z .  After  what  time  will  the  whole  system  be  free?  This 
random  time  will  clearly  have  the  same  distribution  as  . 

Since  the  are  independent,  we  have 


Pfe  <  x)  =  P 


n 


m <x) 

7  =  1 


[P(£i  <*)]". 


(7.6.1) 


If  n  is  large,  then  for  approximate  calculations  we  could  find  the  limiting  distri¬ 
bution  of  t,n  as  n  — >  oo.  Note  that,  for  any  fixed  jc,  P(£n  <  x)  — >►  0  as  n  — >  oo. 

Assuming  for  simplicity  that  a  =  1  (the  general  case  can  be  reduced  to  this  one 
by  changing  the  scale),  we  apply  L’ Hospital’s  rule  to  see  that,  as  v  — >  oo, 


P (§7  <x)  = 


1 


r(A) 


a 


i-i  -y 


dy 


xx~l 
- e 

r(A) 
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Letting  n  — >  oo  and 

x  =  x(n)  =  In  [n  (In  ft) 1-1  /  L  (A)]  +  ft,  ft  =  const, 


we  get 


,  (In  «)A— 1  r(A)  e 

P (§/  >  v)  ~  - - - - — — r<?  = 


— u 


r(X)  ft  (In  ft) 


i-i 


ft 


Therefore  for  such  v  and  n  — >  oo  we  obtain  by  (7.6.1)  that 


,  — W 


P(?«  <  Jt)  =  (  1  -  —  (l+o(D) 

Thus  we  have  established  the  existence  of  the  limit 

ftQnft)^-1 


—e 


— u 


lim  P I  —  In 

/2— >OG 


ra) 


<  ft  =  e 


-e 


—u 


or,  which  is  the  same,  that 


Kn  In 


ft  (In  ft) 


i-i  i 


rw 


*0, 


^o(m)  =  e 


—e 


—u 


In  other  words,  for  large  n  the  variable  admits  the  representation 

«(ln«)A_1 1 


Kn  ~  In 


TO 


+  f  where  Fq- 


Example  7.6.3  Let  and  £2  be  independent  with  £1  ra>x,  and  £2  ^  What 
is  the  distribution  of  fi/(£i  +  (2 ) ?  We  will  make  use  of  Theorem  4.9.2.  Since  the 
joint  density  f(x,y)  of  £1  and  //  =  £1  +  §2  is  equal  to 

/(*,  y)  =  f(x;  a,  k\)f(y  -  x;  a,  X2), 

the  density  of  rj  is 


q{y)  =  f{y,a,\  1  +A.2), 


and  the  conditional  density  f(x  \  y)  of  % \  given  /;  =  y  is  equal  to 


fix  |  y)  = 


f(x,y) 

qiy) 


F(A i+A2)  xA>-1(y-x)A2"1 

F(M)TO)  yAi+A2-l 


x  e  [0,  y]. 


By  the  formulas  from  Sect.  3.2  the  conditional  density  of  §1  /y  =  §1  /(§i  +  §2)  (given 
the  same  condition  §1  +  §2  =  y)  is  equal  to 


yfiyx  I  y)  =  rr?7nAll~'(1-A>lr'  *  e  [0,  U 

1  (Ai)i  (A2) 

This  distribution  does  not  depend  on  y  (nor  on  a).  Hence  the  conditional  density 
of  §i/(£i  +  §2)  will  have  the  same  property,  too.  We  obtain  the  so-called  beta  distri¬ 
bution  B^1 with  parameters  X\  and  A  2  defined  on  the  interval  [0,  1].  In  particular, 
for  Ai  =  A  2  =  1,  the  distribution  is  uniform:  Bpi  =  Uq,i. 
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7.7  Generating  Functions.  Application  to  Branching  Processes. 

A  Problem  on  Extinction 

7. 7. 1  Generating  Functions 

We  already  know  that  if  a  random  variable  £  is  integer-valued,  i.e. 

p(Ug  =  *>)  =  1’ 

k 

then  the  ch.f.  <p%(t)  will  actually  be  a  function  of  z  =  elt ,  and,  along  with  its  ch.f., 
the  distribution  of  §  can  be  specified  by  its  generating  function 

Pf(z)  :=Ez*  =  £z*P($=*). 

k 

The  inversion  formula  can  be  written  here  as 

P($  =  *)  =  2-  r  e-itkcp, ?(r)dt  =  -L  f  z~k~lp^z)dz.  (7.7.1) 
271  27 XI  J\z\=i 

As  was  already  noted  (see  Sect.  7.2),  relation  (7.7.1)  is  simply  the  formula  for 
Fourier  coefficients  (since  eltk  =  cos  tk  +  i  sin  tk). 

If  §  and  rj  are  independent  random  variables,  then  the  distribution  of  §  +  rj  will 
be  given  by  the  convolution  of  the  sequences  P(§  =  k)  and  P (rj  =  k): 

oo 

P ($  +  n=n)=  P(f  =  k)P(ri  =  n  -  k) 

k=—o o 

(the  total  probability  formula).  To  this  convolution  there  corresponds  the  product  of 
the  generating  functions: 

Pf+!)(z)  =  P!=(z)pv(z). 

It  is  clear  from  the  examples  considered  in  Sect.  7.1  that  the  generating  functions  of 
random  variables  distributed  according  to  the  Bernoulli  and  Poisson  laws  are 

P|(z)  =  1  +  p(z  -  1),  Pt(z)  =  exp{^(z  -  1)}, 

respectively. 

One  can  see  from  the  definition  of  the  generating  function  that,  for  a  nonnegative 
random  variable  $  >  0,  the  function  p^(z)  is  defined  for  \z\  <  1  and  is  analytic  in 
the  domain  \z\  <  1. 


7. 7.2  The  Simplest  Branching  Processes 

Now  we  turn  to  sequences  of  random  variables  which  describe  the  so-called  branch¬ 
ing  processes.  We  have  already  encountered  a  simple  example  of  such  a  process 
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when  describing  a  chain  reaction  scheme  in  Example  4.4.4.  Consider  a  more  general 
scheme  of  a  branching  process.  Imagine  particles  that  can  produce  other  particles 
of  the  same  type;  these  could  be  neutrons  in  chain  reactions,  bacteria  reproducing 
according  to  certain  laws  etc.  Assume  that  initially  there  is  a  single  particle  (the 
“null  generation”)  that,  as  a  result  of  a  “division”  act,  transforms  with  probabilities 
fk,  k  =  0,  1 , 2, . . . ,  into  k  particles  of  the  same  type, 

oo 

X>  =  1- 

k= 0 

The  new  particles  form  the  “first  generation”.  Each  of  the  particles  from  that  gen¬ 
eration  behaves  itself  in  the  same  way  as  the  initial  particle,  independently  of  what 
happened  before  and  of  the  other  particles  from  that  generation.  Thus  we  obtain  the 
“second  generation”,  and  so  on.  Denote  by  the  number  of  particles  in  the  n- th 
generation.  To  describe  the  sequence  introduce,  as  we  did  in  Example  4.4.4, 
independent  sequences  of  independent  identically  distributed  random  variables 

. 

where  £  •  have  the  distribution 

P  (^n)=k)  =  fk,  k  =  0,1,.... 

Then  the  sequence  £„  can  be  represented  as 

<To  =  1, 
ft  = 


in) 

1 


+  •••  +  £?(. 


These  are  sums  of  random  numbers  of  random  variables.  Since  ^  ,  §2  ,  •  •  •  do  not 
depend  on  fw_i,  for  the  generating  function  f(n)(z)  =  E z^n  we  obtain  by  the  total 
probability  formula  that 


/d.)fe)  =  EP  tin-1 

k= 0 
00 

=  y]P(^-t  =  k)fk(z)  =  /(„_!)  (7.7.2) 

k= 0 


/(z)  :=  =  Ez?1< }  =  E  fkZk- 

k= 0 


where 
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Fig.  7.1  Finding  the 
extinction  probability  of  a 
branching  process:  it  is  given 
by  the  smaller  of  the  two 
solutions  to  the  equation 
z  =  f(z) 


Denote  by  fn(z)  the  n- th  iterate  of  the  function  /(z),  i.e.  f\(z)  =  /(z),  fi(z)  = 
/(/(z)),  f$(z)  =  f(fi(z))  and  so  on.  Then  we  conclude  from  (7.7.2)  by  induction 
that  the  generating  function  of  t,n  is  equal  to  the  n- th  iterate  of  /(z): 

E  zKn  =  /(„>(z). 

From  this  one  can  easily  obtain,  by  differentiating  at  the  point  z  =  1 ,  recursive  rela¬ 
tions  for  the  moments  of  . 

How  can  one  find  the  extinction  probability  of  the  process?  By  extinction  we  will 
understand  the  event  that  all  starting  from  some  n  will  be  equal  to  0.  (If  =  0 
then  clearly  fw+i  =  =  •  •  •  =  0,  because  P(^+i  =  0|  =  0)  =  1.  )  Set  Ak  = 

{&  =  0}.  Then  extinction  is  the  event  U£i  Since  An  C  A„+i,  the  extinction 
probability  q  is  equal  to  q  =  lim^oo  P(An). 

Theorem  7.7.1  The  extinction  probability  q  is  equal  to  the  smallest  nonnegative 
solution  of  the  equation  q  =  f(q). 

Proof  One  has  P(An)  =  fn( 0)  <  1,  and  this  sequence  is  non-increasing.  Passing  in 
the  equality 

/»+i(0)  =  /(/„(0))  (7.7.3) 

to  the  limit  as  n  — >  oo,  we  obtain 

q  =  f(q),  q<  l. 

This  is  an  equation  for  the  extinction  probability.  Let  us  analyse  its  solutions.  The 
function  f(z)  is  convex  (as  f"(z)  >  0)  and  non-decreasing  in  the  domain  z  >  0 
and  /r(l)  =  m  is  the  mean  number  of  offspring  of  a  single  particle.  First  assume 
that  P(ff)  =  1)  <  1.  If  m  <  1  then  f(z)  >  z  for  z  <  1  and  hence  q  =  1.  If  m  >  1 
then  by  convexity  of  /  the  equation  q  =  f(q)  has  exactly  two  solutions  on  the 
interval  [0,  1]:  q\  <  1  and  ^2  =  1  (see  Fig.  7.1).  Assume  that  q  =  q2  =  \.  Then  the 
sequence  8n  =  1  —  fn( 0)  will  monotonically  converge  to  0,  and  /(I  —  8n)  <  1  —  8n 
for  sufficiently  large  n.  Therefore,  for  such  n , 


<W l  =  1  —  f(l  —  8n)  >  8n, 
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which  is  a  contradiction  as  8n  is  a  decreasing  sequence.  This  means  that  q  =  q  1  <  1 . 
Finally,  in  the  case  P(§{^  =  1)  =  f\  =  1  one  clearly  has  f(z)  =  z  and  q  =  0.  The 
theorem  is  proved.  □ 


Now  consider  in  more  detail  the  case  m  =  1,  which  is  called  critical.  We  know 
that  in  this  case  the  extinction  probability  q  equals  1.  Let  qn  =  P (An)  =  fn( 0)  be 
the  probability  of  extinction  by  time  n.  How  fast  does  qn  converge  to  1?  By  (7.7.3) 
one  has  qn+\  =  f(qn)-  Therefore  the  probability  pn  =  1  —  qn  of  non-extinction  of 
the  process  by  time  n  satisfies  the  relation 

Pn+ 1  =  g(Pn),  gW  =  1  -  /(I  -  X). 

It  is  also  clear  that  yn  =  pn  —  pn+\  is  the  probability  that  extinction  will  occur 
on  step  n. 

Theorem  7.7.2  If  m  =  /'(  1)  =  1  and  0  <  b  :=  /"(  1)  <  oo  then  yn  ~  ^ 

Pn  ~  ^  n  oo. 


Proof  If  the  second  moment  of  the  number  of  offspring  of  a  single  particle  is  finite 
(Z?  <  oo)  then  the  derivative  g"(0)  =  —  b  exists  and  therefore,  since  g(0)  =  0  and 
gr(0)  =  /'(  1)  =  1,  one  has 


b  9  / 

g(x)=X  -  -x  ~ho(x  ), 


oo. 


Putting  x  =  pn  — >  0,  we  find  for  the  sequence  an  =  1/  pn  that 


rG+i  ^ 


an 


Pn  ~  Pn+ 1  _  6p„(l  +  °(U) _ * 

Pn£»n+1  2p2(i  -bPn/2  +  o(pn))  2’ 


n  —  1 

fll  +  -  ak)  ~ 

*=1 


Zw 

T’ 


The  theorem  is  proved. 


□ 


Now  consider  the  problem  on  the  distribution  of  the  number  of  particles  given 

Kn  >  0- 


Theorem  7.7.3  Under  the  assumptions  of  Theorem  7.7.2,  the  conditional  distribu¬ 
tion  of  pn^n  (or  2£n/(bn))  given  £n  >  0  converges  as  n  —>  oo  to  the  exponential 
distribution : 


P(Pn£n  >  x\Zn  >  0)  e  x,  X  >  0. 

The  above  statement  means,  in  particular,  that  given  Kn  >  0?  the  number  of  parti¬ 
cles  tln  is  of  order  n  as  n  — >  oo. 
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Proof  Consider  the  Laplace  transform  (see  Property  6  in  Sect.  7.1)  of  the  condi¬ 
tional  distribution  of  pn  £/?  (given  Kn  >0)! 


skPnP  (Kn=k). 


(7.7.4) 


We  will  make  use  of  the  fact  that,  if  we  could  find  an  N  such  that  e~sPn  =  1  —  pn, 
which  is  the  probability  of  extinction  by  time  N,  then  the  right-hand  side  of  (7.7.4) 
will  give,  by  the  total  probability  formula,  the  conditional  probability  of  the  extinc¬ 
tion  of  the  process  by  time  n  +  N  given  its  non-extinction  at  time  n.  We  can  evaluate 
this  probability  using  Theorem  7.7.2. 

Since  pn  — >  0,  for  any  fixed  s  >  0  one  has 

e~sPn  —  1  ~  —spn  ~ - . 

bn 

Clearly,  one  can  always  choose  N  ^  n/s,sn  ~  s,sn  ^  s  such  that  e~SnPn  —  \  =  —pN. 
Therefore  e~SnPnk  =  (1  —  p^)k  and  the  right-hand  side  of  (7.7.4)  can  be  rewritten 
for  s  =  sn  as 


1 

Pn 


J2^n=k)(l-PN)k 

k=  1 


1 


Pn 


>  0?  ^n+N  —  0) 


 Pn  Pn+N 
Pn 

-  I  Pn+N  ^  l  H  -  N  1 

pn  n  +  N  n  +  N  1  +  s 

Now  note  that 

E(e~sp"^n  |f„  >  0)  -  E(e~SnPn(n  | >  0)  =  (l  -  e~(s"~s)Pn^\^n  >  0)]. 


Since  e~a  <  1  and  1  —  e~a  <  a  for  a  >  0,  and  Et;n  =  1,  E(fw|fw  >  0)  =  l/pn,  it  is 
easily  seen  that  the  positive  (since  sn  >  s)  difference  of  the  expectations  in  the  last 
formula  does  not  exceed 


(sn  S ) pnE(£n  |  >  0)  —  Sn  S  >  0. 

Therefore  the  Laplace  transform  (7.7.4)  converges,  as  n  — >  oo,  to  1/(1  +  s). 
Since  1/(1  +  s)  is  the  Laplace  transform  of  the  exponential  distribution: 

r°°  l 

/  e~sx~xdx  = - , 

Jo  1  +  s 

we  conclude  by  the  continuity  theorem  (see  the  remark  after  Theorem  7.3.1  in 
Sect.  7.3)  that  the  conditional  distribution  of  interest  converges  to  the  exponential 
law.6 

In  Sect.  15.4  (Example  15.4.1)  we  will  obtain,  as  consequences  of  martingale 
convergence  theorems,  assertions  about  the  behaviour  of  as  n  — >►  oo  for  branching 
processes  in  the  case  fi  >  1  (the  so-called  supercritical  processes).  □ 


6The  simple  proof  of  Theorem  7.7.3  that  we  presented  here  is  due  to  K.A.  Borovkov. 


Chapter  8 

Sequences  of  Independent  Random  Variables. 
Limit  Theorems 


Abstract  The  chapter  opens  with  proofs  of  Khintchin’s  (weak)  Law  of  Large  Num¬ 
bers  (Sect.  8.1)  and  the  Central  Limit  Theorem  (Sect.  8.2)  the  case  of  independent 
identically  distributed  summands,  both  using  the  apparatus  of  characteristic  func¬ 
tions.  Section  8.3  establishes  general  conditions  for  the  Weak  Law  of  Large  Num¬ 
bers  for  general  sequences  of  independent  random  variables  and  also  conditions  for 
the  respective  convergence  in  mean.  Section  8.4  presents  the  Central  Limit  Theo¬ 
rem  in  the  triangular  array  scheme  (the  Lindeberg-Feller  theorem)  and  its  corollar¬ 
ies,  illustrated  by  several  insightful  examples.  After  that,  in  Sect.  8.5  an  alternative 
method  of  compositions  is  introduced  and  used  to  prove  the  Central  Limit  Theo¬ 
rem  in  the  same  situation,  establishing  an  upper  bound  for  the  convergence  rate  for 
the  uniform  distance  between  the  distribution  functions  in  the  case  of  finite  third 
moments.  This  is  followed  by  an  extension  of  the  above  results  to  the  multivariate 
case  in  Sect.  8.6.  Section  8.7  presents  important  material  not  to  be  found  in  other 
textbooks:  the  so-called  integro-local  limit  theorems  on  convergence  to  the  normal 
distribution  (the  Stone-Shepp  and  Gnedenko  theorems),  including  versions  for  sums 
of  random  variables  depending  on  a  parameter.  These  results  will  be  of  crucial  im¬ 
portance  in  Chap.  9,  when  proving  theorems  on  exact  asymptotic  behaviour  of  large 
deviation  probabilities.  The  chapter  concludes  with  Sect.  8.8  establishing  integral, 
integro-local  and  local  theorems  on  convergence  of  the  distributions  of  scaled  sums 
on  independent  identically  distributed  random  variables  to  non-normal  stable  laws. 


8.1  The  Law  of  Large  Numbers 

Theorem  8.1.1  (Khintchin’s  Law  of  Large  Numbers)  Let  be  a  sequence 

of  independent  identically  distributed  random  variables  having  a  finite  expectation 
=  a  and  let  Sn  :=  H - +  .  Then 

Sn  p 

- >  a  as  n  — >  oo. 

n 

The  above  assertion  together  with  Theorems  6.1.6  and  6.1.7  imply  the  following. 
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Corollary  8.1.1  Under  the  conditions  of  Theorem  8.1.1,  as  well  as  convergence  of 
Sn/n  in  probability ,  convergence  in  mean  also  takes  place : 


E 


>n 


n 


—  a 


0  as  n  — >  oo, 


Note  that  the  condition  of  independence  of  ^  and  the  very  assertion  of  the  the¬ 
orem  assume  that  all  the  random  variables  §&  are  given  on  a  common  probability 
space. 

From  the  physical  point  of  view,  the  stated  law  of  large  numbers  is  the  sim¬ 
plest  ergodic  theorem  which  means,  roughly  speaking,  that  for  random  variables 
their  “time  averages”  and  “space  averages”  coincide.  This  applies  to  an  even  greater 
extent  to  the  strong  law  of  large  numbers,  by  virtue  of  which  Sn/n  — >  a  with  prob¬ 
ability  1. 

Under  more  strict  assumptions  (existence  of  variance)  Theorem  8.1.1  was  ob¬ 
tained  in  Sect.  4.7  as  a  consequence  of  Chebyshev’s  inequality. 


Proof  of  Theorem  8.1.1  We  have  to  prove  that,  for  any  s  >  0, 


P 


0 


as  n  — >  oo.  The  above  relation  is  equivalent  to  the  weak  convergence  of  distributions 
Sn/n  \a.  Therefore,  by  the  continuity  theorem  and  Example  7.1.1  it  suffices  to 
show  that,  for  any  fixed  t, 

<Ps„/n(t)^eiat. 


The  ch.f.  (p{t)  of  the  random  variable  ^  has,  in  a  certain  neighbourhood  of  0,  the 
property  | (pit)  —  1|  <  1/2.  Therefore  for  such  t  one  can  define  the  function  l(t )  = 
In  (pit )  (we  take  the  principal  value  of  the  logarithm).  Since  has  finite  expectation, 
the  derivative 


=  ia 


exists.  For  each  fixed  t  and  sufficiently  large  n ,  the  value  of  l(t/n)  is  defined  and 


<psn,n(t)  =  <pn(t/n)  =  el^n)n. 


Since  1(0)  =  0,  one  has 


eKt/n)n 


=  exp 


t 


l(t/n)-l(  0) 
t/n 


=  e 


iat 


as  n  — >  oo.  The  theorem  is  proved. 


□ 
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8.2  The  Central  Limit  Theorem  for  Identically  Distributed 
Random  Variables 

Let,  as  before,  {§ n }  be  a  sequence  of  independent  identically  distributed  random 
variables.  But  now  we  assume,  along  with  the  expectation  E^n  =  a,  the  existence 
of  the  variance  Var§n  =  a2.  We  retain  the  notation  Sn  =  ^ i  +  •  •  •  +  for  sums  of 
our  random  variables  and  <P(x)  for  the  normal  distribution  function  with  parameters 
(0,  1).  Introduce  the  sequence  of  random  variables 


Sn  —  an 


Theorem  8.2.1  If  0  <  a2  <  oo,  then  p(?»  <  jt)  —>  @(x)  uniformly  in  x  (— oo  < 
v  <  oo)  as  n  -w  oo. 

In  such  a  case,  the  sequence  {£„}  is  said  to  be  asymptotically  normal. 

It  follows  from  =>-  f  4>o,i,  Kn  —  E £2  =  Ef2  =  1  and  from  Lemma  6.2.3 
that  the  sequence  {£2}  is  uniformly  integrable.  Therefore,  as  well  as  the  weak 
convergence  =>►  f,  f  ^  <I>o,i  (E/(fw)  — >  E/(f)  for  any  bounded  continuous 
/),  one  also  has  convergence  E/(fw)  — >  E f(£)  for  any  continuous  /  such  that 
1/0)1  <  c(l  +  v2)  (see  Theorem  6.2.3). 

Proof  of  Theorem  8.2.1  The  uniform  convergence  is  a  consequence  of  the  weak 
convergence  and  continuity  of  <P(x).  Further,  we  may  assume  without  loss  of  gen¬ 
erality  that  a  =  0,  for  otherwise  we  could  consider  the  sequence  {§'„  =%n~  a)T=i 
without  changing  the  sequence  {£„}.  Therefore,  to  prove  the  required  convergence, 

it  suffices  to  show  that  cp^n  (t)  — >  e~l  ^  when  a  =  0.  We  have 


nn(f)  =  <pn 


own 


,  where  ^(0  =  cp^k  (t), 


Since  E§2  exists,  ^/'(i)  also  exists  and,  as  t  — >  0,  one  has 


f2cr2 


—  ^(0)  +  t(p\ 0)  +  —cp'fO)  +  )  —  1 - - - F  ) 


(8.2.1) 


Therefore,  as  /i  — >►  oo. 


In  (t )  =^ln 


1  - 


a 


t 


2  \  “| 


2  \o^/n 


=  n 


4+»(- 


+  0| 
t1 

=  -2+°(1) 


The  theorem  is  proved. 


□ 
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8.3  The  Law  of  Large  Numbers  for  Arbitrary  Independent 
Random  Variables 


Now  we  proceed  to  elucidating  conditions  under  which  the  law  of  large  numbers  and 
the  central  limit  theorem  will  hold  in  the  case  when  ^  are  independent  but  not  nec¬ 
essarily  identically  distributed.  The  problem  will  not  become  more  complicated  if, 
from  the  very  beginning,  we  consider  a  more  general  situation  where  one  is  given  an 
arbitrary  series  . . . ,  §w>w,  n  =  1,2,...  of  independent  random  variables,  where 
the  distributions  of  § k,n  may  depend  on  n.  This  is  the  so-called  triangular  array 
scheme. 

Put 


n 


Kn  • —  ^ 


n 


k= 1 


From  the  viewpoint  of  the  results  to  follow,  we  can  assume  without  loss  of  generality 
that 


E&f„=0.  (8.3.1) 

Assume  that  the  following  condition  is  met:  as  n  — >  oo, 

n 

D\  :=  y^Emin (|gft,n|,  \&,n\2)  0.  [Di] 

k=  1 

Theorem  8.3.1  (The  Law  of  Large  Numbers)  If  conditions  (8.3.1)  and  [Di]  are 

p 

satisfied,  then  f n  €=>  Io  or ;  which  is  the  same ,  — >►  0  as  n  — >►  oo. 

Example  8.3.1  Assume  do  not  depend  on  n ,  =  0  and  E|^|5  <m^  < 

oo  for  1  <  s  <  2.  For  such  s,  there  exists  a  sequence  b(n)  =  o(n)  such  that  n  = 
o(hs(n)).  Since,  for  =  %k/b(n), 

Emin(|&,„|,  ^2„)  =  E 

<  E 

=  msb~s(n), 


;  !&!<*(«) 


+  E 


+  E 


l&l 


_b(h) 

j*  s 

b(n) 


-,m  >Hn) 


;  IS*  I  >b(n) 


we  have 


D\<nmsb  s (n)  — >  0, 
p 

and  hence  Sn/b(n )  ->  0. 

A  more  general  sufficient  condition  (compared  to  <  oo)  for  the  law  of  large 
numbers  is  contained  in  Theorem  8.3.3  below.  Theorem  8.1.1  is  an  evident  corollary 
of  that  theorem. 
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Now  consider  condition  [Di]  in  more  detail.  It  can  clearly  also  be  written  in  the 
form 

n  n 

=  £e(|&i„|;  |&,„|  >  1)  +  ^E(|£m|2;  |&,„|  <  l)  — ►  0. 

k=\  k=  1 

Next  introduce  the  condition 


n 

Mi  :=  E  E|  &  <  C  <  00 
&=1 


(8.3.2) 


and  the  condition 


Mi(r)  :=  ^E(|&,„|;  |&,„|  >  ?)  0  [Ml] 

k=  1 

for  any  r  >  0  as  /i  — >►  oo.  Condition  [Mi]  could  be  called  a  Lindeberg  type  condition 
(the  Lindeberg  condition  [M2]  will  be  introduced  in  Sect.  8.4). 

The  following  lemma  explains  the  relationship  between  the  introduced  condi¬ 
tions. 

Lemma  8.3.1  1.  {[Mi]  n  (3.2)}  c  [Di].  2.  [Di]  C  [Mi]. 

That  is,  conditions  [Mi]  and  (8.3.2)  imply  [Di],  and  condition  [Di]  implies 
[Mi]. 

It  follows  from  Lemma  8.3.1  that  under  condition  (8.3.2),  conditions  [Di]  and 
[Mi]  are  equivalent. 

Proof  of  Lemma  8.3.1  1.  Let  conditions  (8.3.2)  and  [Mi]  be  met.  Then,  for 

r<l,  gi(x)  =  min(|x|,  \x\2). 


one  has 

n  n  n 

Dx  =  (&.»)  <  EE(I^I;  l&.»l  >  T)  +  Ee(I^«I2;  l&.»l  <  b 

k= 1  k= 1  k= 1 

n 

<  Mi(z)  +  x  ^E(|&,„|;  !§■*,„ |  <  r)  <  Mi(r)  +  rMi(O).  (8.3.3) 

>t=i 

Since  M\  (0)  =  M\  <c  and  r  can  be  arbitrary  small,  we  have  D\  ->  0  as  n  >  oo. 

2.  Conversely,  let  condition  [Di]  be  met.  Then,  for  x  <  1, 

n 

Mi( x)  <  ^E(|&,„|;  |&,„|  >  l) 

k=  1 
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n 

+  r”1  £]E(|^,n|2;  r  <  \^n\ <  l)  <  x~lDx  0  (8.3.4) 

k=  1 

as  n  — >  oo  for  any  r  >  0.  The  lemma  is  proved.  □ 


Let  us  show  that  condition  [Mi]  (as  well  as  [Di])  is  essential  for  the  law  of  large 
numbers  to  hold. 

Consider  the  random  variables 


1 

n 


with  probability  ^ , 
with  probability  1  —  ^ . 


For  them,  E£&  n  =  0,  E|^  n\  =  2(n 2  ^  ~  -,  M\  <  2,  condition  (8.3.2)  is  met,  but 
M\{x)  =  ^  i  for  n  >  2,  r  <  1/2,  and  thus  condition  [Mi]  is  not  satisfied.  Here 

i  L  Z/ 

the  number  vn  of  positive  ^  ,m  1  ^  k  —  n,  converges  in  distribution  to  a  random 
variable  v  having  the  Poisson  distribution  with  parameter  X  =  1.  The  sum  of  the 

remaining  ^,«s  is  equal  to  —  2  — —  l.  Therefore,  f w  +  1  ^  II i  and  the  law 

of  large  numbers  does  not  hold. 

Each  of  the  conditions  [Di]  and  [Mi]  imply  the  uniform  smallness  of  E|^,«  |: 


maxE|£&w|— >0  as  n  — >►  oo.  (8.3.5) 

i<^<^ 


Indeed,  equation  [Mi]  means  that  there  exists  a  sufficiently  slowly  decreasing  se¬ 
quence  xn  — >►  0  such  that  M\{xn)  — >►  0.  Therefore 

maxEI^J  <  max[r„  +E(|&,„|;  |§*jfI|  >  t„)1  <  r„  +  Mi(r„)  0.  (8.3.6) 

In  particular,  (8.3.5)  implies  the  negligibility  of  the  summands  §&jW. 

We  will  say  that  are  negligible ,  or,  equivalently,  have  property  [S],  if,  for  any 

£  >  0, 

maxP(|£fc  w|  >  e)  — >  0  asn^oo.  [S] 

k<n 

Property  [S]  could  also  be  called  uniform  convergence  of  ^k,n  in  probability  to 
zero.  Property  [S]  follows  immediately  from  (8.3.5)  and  Chebyshev’s  inequality.  It 
also  follows  from  stronger  relations  implied  by  [Mi]: 

p(max  >  e)  =p(  [_J{|&,„|  >  e}j 

~n  ' k<n  ' 

<  £p(|&,„|  >  e)  <  e_1  £e(|&,„|;  |&,„|  >  e)  ->  0.  [Si] 

We  now  turn  to  proving  the  law  of  large  numbers.  We  will  give  two  versions  of 
the  proof.  The  first  one  illustrates  the  classical  method  of  characteristic  functions. 
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The  second  version  is  based  on  elementary  inequalities  and  leads  to  a  stronger  as¬ 
sertion  about  convergence  in  mean. 

Here  is  the  first  version. 


Proof  of  Theorem  8.3. 7  2  Put 


<Pk 


At)  Ak(t)  '■  =  (pk,n  (t)  -  1 


One  has  to  prove  that,  for  each  t, 


n 


<pu(t) = Ee'ti;n  =  ]~~[ (pkAt)  r 


k= 1 


as  n  oo.  By  Lemma  7.4.2 


nn  (0  - 1 


n 


n 


rw»  -  n 1 


k= i 


k= i 


n 


<£l4t(OI 


k=  1 


n 


n 


=  J2\Eeil^n  -  l|  =  J2\E(eit^n  ~  1  -  it^k.n) 


k=  1 


k=l 


By  Lemma  7.4.1  we  have  (for  gi(v)  =  min(|v|,  x2)) 


eltx  —  1  —  itx  <  min(2|ta|,  t2x2  / 2)  <  2g\(tx)  <  2/t(/L)gi(/L), 


where  h(t)  =  max(|f  |,  |f  |2).  Therefore 


<Ps„(t)  -  l|  <2/t(Q^Egi(^,„)  =  2/i(0£)i  ->  0. 


k= 1 


The  theorem  is  proved.  □ 

The  last  inequality  shows  that  | (p$n(t)  —  1|  admits  a  bound  in  terms  of  D\.  It 
turns  out  that  E|fw|  also  admits  a  bound  in  terms  of  D\ .  Now  we  will  give  the 
second  version  of  the  proof  that  actually  leads  to  a  stronger  variant  of  the  law  of 
large  numbers. 

Theorem  8.3.2  Under  conditions  (8.3.1)  and  [Di]  one  has  E|fw|  0  ( i.e . 


^he  second  version  was  communicated  to  us  by  A.I.  Sakhanenko. 

2There  exists  an  alternative  “direct”  proof  of  Theorem  8.3.1  using  not  ch.f.s  but  the  so-called 
truncated  random  variables  and  estimates  of  their  variances.  However,  because  of  what  follows,  it 
is  more  convenient  for  us  to  use  here  the  machinery  of  ch.f.s. 
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The  assertion  of  Theorem  8.3.2  clearly  means  the  uniform  integrability  of  {£„}; 
it  implies  Theorem  8.3.1,  for 

P(lf«l  >  e)  <  E|C«l/e  0  as  n  ^  oo. 

Proof  of  Theorem  83.2  Put 


,  /  £ k,n  if  I  Hk,n  I  E  1  > 

k,n  0  otherwise, 

and  :=  -  %k,n-  Then  +  $*,fi  and  =  &  +  C*  with  an  obvious 

convention  for  the  notations  By  the  Cauchy-Bunjakovsky  inequality, 


E|?„l  <  E|f'  -  Ef' |  +  E|^  -  E?"|  <  jEfe  -  E^)2  +  E|?''|  +  |Ef, 


// 

n 


<  y/'LMSk.n)  +  2EEKn  I  <  +  2EEKn  I 

=  [E  EfcV  i&,»i  <  9]1/2 

+  2  |;  |^>w  |  >  l)  <  V Th  +  2Di  — >  0, 

if  D\  — >►  0.  The  theorem  is  proved.  □ 


Remark  8.3. 1  It  can  be  seen  from  the  proof  of  Theorem  8.3.2  that  the  argument  will 
remain  valid  if  we  replace  the  independence  of  %k,n  by  the  weaker  condition  that 
§£  n  are  non-correlated.  It  will  also  be  valid  if  ^  n  are  only  weakly  correlated  so  that 

E(^-E?')2<cy]Var(^„),  c  <  oo. 


If  {%k}  is  a  given  fixed  (not  dependent  on  n)  sequence  of  independent  random 
variables,  Sn  =  Jfk=  l  %k  and  E =  ak,  then  one  looks  at  the  applicability  of  the  law 
of  large  numbers  to  the  sequences 


&k 

b(n) 


(8.3.7) 


where  %k,n  satisfy  (8.3.1),  and  bin)  is  an  unboundedly  increasing  sequence.  In  some 
cases  it  is  natural  to  take  bin)  =  Jfk= iE|^|  if  this  sum  increases  unboundedly. 
Without  loss  of  generality  we  can  set  ak  =  0.  The  next  assertion  follows  from  The¬ 
orem  8.3.2. 


Corollary  8.3.1  If  as  n  — >►  oo, 


1 


bin) 


EEmin(i^i,  &2/*(«))-»o 
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or,  for  any  r  >  0, 

1  n 

>rb(n))^0,  b(n)  =  Y  E|&| -►  oo,  (8.3.8) 
bin.)  ^  “ 


then  0. 


Now  we  will  present  an  important  sufficient  condition  for  the  law  of  large  num¬ 
bers  that  is  very  close  to  condition  (8.3.8)  and  which  explains  to  some  extent  its 
essence.  In  addition,  in  many  cases  this  condition  is  easier  to  check.  Let  bk  =  E|§^|, 
bn  =  ma Xk<n  bk,  and,  as  before, 

n  n 

Sn  =  y2^k’  b(n)  =  V  hk . 
k= 1  k= 1 

The  following  assertion  is  a  direct  generalisation  of  Theorem  8.1.1  and  Corol¬ 
lary  8.1.1. 

Theorem  8.3.3  Let  =  0,  the  sequence  of  normalised  random  variables  § k/bk 
be  uniformly  integrable  and  bn  =  o(b(n))  as  n  — >  oo.  Then 

(1)  n 

- >  0. 

b(n) 

-  c  (l) 

If  bn  <b<o o  then  b(n)  <  bn  and  ^  — >  0. 

Proof  Since 


E(M;\$k\>Tb{n))<bkE\ 


& 

bk 


| k 

h 


b{n) 
>  r  — — 
br 


'n 


(8.3.9) 


and  >  oo,  the  uniform  integrability  of  {1^}  implies  that  the  right-hand  side 

bn 

of  (8.3.9)  is  o(bk)  uniformly  in  k  (i.e.  it  admits  a  bound  s(n)bk,  where  s(n)  ->  0  as 
n  ->  oo  and  does  not  depend  on  k).  Therefore 


M\{x)  =  f-T  y^E(l?ifcl;  l&l  >  T bin))  0 

b(n)f^ 

as  n  — >  oo,  and  condition  (8.3.8)  is  met.  The  theorem  is  proved. 


□ 


Remark  8.3.2  If,  in  the  context  of  the  law  of  large  numbers,  we  are  interested  in 
convergence  in  probability,  only  then  can  we  generalise  Theorem  8.3.3.  In  particular, 
convergence 


0 
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will  still  hold  if  a  finite  number  of  the  summands  ^  (e.g.,  for  k  <1,  l  being  fixed) 
are  completely  arbitrary  (they  can  even  fail  to  have  expectations)  and  the  sequence 
=  i;k+i,  k  >  1,  satisfies  the  conditions  of  Theorem  8.3.3,  where  b(n)  is  defined  for 

the  variables  ^  and  has  the  property  >  1  as  n  — >  oo. 


This  assertion  follows  from  the  fact  that 

Sn  =  Si  Sn-Sj  b(n  —  /)  b(n  ~ l) 

b(n)  b(n)  b(n  —  l)  b(n)  ’  b(n)  ’  b(n) 

and  by  Theorem  8.3.3 


Sn-Sl  p 

- >  0 

b(n  —  l) 


as  n 


oo. 


Now  we  will  show  that  the  uniform  integrability  condition  in  Theorem  8.3.3 

p 

(as  well  as  condition  M\(r)  ->  0)  is  essential  for  convergence  — >  0.  Consider  a 

sequence  of  random  variables 


*7  = 


2s  -  1 
-1 


for  j  e  Is  :=(2s~\2sls  =  1,2, 
j  e  Is,  and,  for  n  =  2k,  one  has 


with  probability  2  5 , 
with  probability  1  —  2~s 

;  =  0.  Then  E£/  =  0,  E|£y  |  =  2(1  —  2~s)  for 


k 

b{n)  =  YJ  2(1-2-')!/, 

5  =  1 


where  |  7S  |  =  25  —  2s  1  =  2s  1  is  the  number  of  points  in  /*.  Hence,  as  k  —*■  oo. 

bin)  ~  2[(1  -  2~k)2k~x  +  (1  -  2~k+l)2k~2  +  •  •  •  ] 

~  2k  +  2k~l  +  . . .  ~  2k+1  =  2n. 


Observe  that  the  uniform  integrability  condition  is  clearly  not  met  here.  The  distri¬ 
bution  of  the  number  of  jumps  of  magnitude  2s  —  1  on  the  interval  Is  converges, 
as  s  — >  oo,  to  the  Poisson  distribution  with  parameter  1/2  =  Hindoo  2~s \IS  |,  while 
the  distribution  of  2~s  {S2s  —  S2s- i)  converges  to  the  distribution  of  v  —  1/2,  where 
v  ^  II  i/2.  Hence,  assuming  that  n  =  2k,  and  partitioning  the  segment  [2,  n]  into  the 
intervals  (2s  -1, 2s],  s  =  1,  ...,&,  we  obtain  that  the  distribution  of  Sn/n  converges, 
as  k  — >►  oo,  to  the  distribution  of 


Sn  „-k  ^  S2*  ~  S2s-)s 


oo 


=rtT, 


Is  =>£<>«-  1/2)2“' 


5=1 


1=0 


n 


2s 
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where  v/,  /  =  0, 1, . . . ,  are  independent  copies  of  v.  Clearly,  f  #  0,  and  so  Conyer¬ 
s'  p 

gence  ff  -*  0  fails  to  take  place. 

Let  us  return  to  arbitrary  „ .  In  order  for  [Di]  to  hold  it  suffices  that  the  follow¬ 
ing  condition  is  met:  for  some  s,  2  >  s  >  1, 

n 

£E|&>b|'->0.  [  Lv  ] 

k=\ 

This  assertion  is  evident,  since  gi(v)  <  \x\s  for  2  >  s  >  1.  Conditions  I L  v  ]  could 
be  called  the  modified  Lyapunov  conditions  (cf.  the  Lyapunov  condition  [L^]  in 
Sect.  8.4). 

To  prove  Theorem  8.3.2,  we  used  the  so-called  “truncated  versions”  ^  of  the 
random  variables  ^n.Now  we  will  consider  yet  another  variant  of  the  law  of  large 
numbers,  in  which  conditions  are  expressed  in  terms  of  truncated  random  variables. 
Denote  by  the  result  of  truncation  of  the  random  variable  §  at  level  N: 

^ N ^  =  max[— N,  min(N,  §)]. 


Theorem  8.3.4  Let  the  sequence  of  random  variables  {^}  in  (8.3.7)  satisfy  the 
following  condition:  for  any  given  e  >  0,  there  exist  Nk  such  that 


1 


n 


f-zJ2Nk<N  <o°- 


b(n) 


1 


n 


k= 1 


fc=i 


77z£/i  f/ze  sequence  {f n }  converges  to  zero  in  mean :  >  0. 

Proof  Clearly  :=  E^^  — >►  as  — >  oo  and  Further,  we 

have 


1 


b(n)E l£^*  ^ 


1 


< 


A(n) 


+  E 


£El&-s. 


(VO 


£ 


(VO  _  _(V0 
A: 


Z?(n) 


+ 


1 


b(n) 


£ 


(VO 

a\  -ak 


Here  the  second  term  on  the  right-hand  side  converges  to  zero,  since  the  sum  under 
the  expectation  satisfies  the  conditions  of  Theorem  8.3.1  and  is  bounded.  But  the 
first  and  the  last  terms  do  not  exceed  s.  Since  the  left-hand  side  does  not  depend  on 
s,  we  have  E|fw|  ->  0  as  n  — >  oo.  □ 


Corollary  8.3.2  If  b(n)  =n  and,  for  sufficiently  large  N  and  all  k  <n, 

E|&  -  £ N) 


then  fn 


<  s, 
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The  corollary  follows  from  Theorem  8.3.4,  since  the  conditions  of  the  corollary 
clearly  imply  the  conditions  of  Theorem  8.3.4. 

It  is  obvious  that,  for  identically  distributed  §&,  the  conditions  of  Corollary  8.3.2 
are  always  met,  and  we  again  obtain  a  generalisation  of  Theorem  8.1.1  and  Corol¬ 
lary  8.1.1. 

IfE|&|  r  <  oo  for  r  >  1 ,  then  we  can  also  establish  in  a  similar  way  that 

Sn  ( r ) 

- >  a. 

n 

p 

Remark  8.3.3  Condition  [Di]  (or  [Mi])  is  not  necessary  for  convergence  — >►  0 
even  when  (8.3.2)  and  (8.3.5)  hold,  as  the  following  example  demonstrates.  Let  %k,n 
assume  the  values  —n,  0,  and  n  with  probabilities  1  /n2,  1  —  2/n2,  and  1  /n2,  re- 

spectively.  Here  4  0,  since  P(f„  /  0)  <  P(Ufe,n  /  0})  <  2 ./«->•  0,  E|f*,nl  = 
2/n  ->  0  and  Mi  =  £E|£*,„|  =  2  <  oo.  At  the  same  time,  j]E(|£*>n|;  |^,„|  > 
1)  =  2  -/>  oo,  so  that  conditions  [Di]  and  [Mi]  are  not  satisfied. 

However,  if  we  require  that 


£ k,n  £k,ni  &k,n  0? 


maxs^  ->  0, 

k<n 


n 

<  C  <  OO, 

k=  1 


(8.3.10) 


p 

then  condition  [Di]  will  become  necessary  for  convergence  — >►  0. 

Before  proving  that  assertion  we  will  establish  several  auxiliary  relations  that 
will  be  useful  in  the  sequel.  As  above,  put  Ak(t)  :=  (pk,n(0  —  1. 


Lemma  8.3.2  One  has 


n 


£|4(0|  <  \t\Mi 


k= 1 


If  condition  [S]  holds ,  then  for  each  t,  as  n  —>  oo. 


max 

k<n 


Ak(t)  -+0. 


If  a  random  variable  £  with  E£  =  0  is  bounded  from  the  left',  £  >  — c,  c  >  0,  then 
E|£|  <  2c. 


Proof  By  Lemma  7.4.1, 


Afc(0|  -  l|  <  |t| E|§t>n|,  ^1^(01  <  |t|Mi 


Further, 
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4t(f) |  <  E(\e^k-n  -  1|;  |&,„|  <  e)  +E(\eit^"  -  l|;  |&>B| >  s) 
<  \t\s  +  2P(|^>n|  >  e). 


Since  s  is  arbitrary  here,  the  second  assertion  of  the  lemma  now  follows  from  con¬ 
dition  [S]. 

Put 


£+  :=  max(0;  §)  >  0,  §  :=—($—  $+)  >  0. 

Then  E§  =  E§+  —  E§_  =0  and  E|§  |  =  E§+  +  =  2E£~  <  2c.  The  lemma  is 

proved.  □ 

From  the  last  assertion  of  the  lemma  it  follows  that  (8.3.10)  implies  (8.3.2)  and 
(8.3.5). 

Lemma  8.3.3  Let  conditions  [S]  and  (8.3.2)  be  satisfied.  A  necessary  and  sufficient 
condition  for  convergence  (p$n(t)  —>  cp(t)  is  that 


n 

y^Akit)  -*  ln^(r). 

k=\ 


Proof  Observe  that 


ReAUO  =  Re(^,«(0  -  l)  <0, 


and  therefore,  by  Lemma  7.4.2, 


(pZn(t)  -  e^Ak(t) 


n 


n 


f~[  (pk,n(t)  ~Y[e 


Ak(t) 


k=\ 


k= 1 


n 


n 


<  ^2\<Pk,n(t)  -  eAk{t)  \  =  ^2\eAk(t)  -  1  -  Ak{t)\ 


k= l 


k= l 


Y  n  y  n 

-  -max|a*(0|^|a*(0 


k=  1 


k=  1 


By  Lemma  8.3.2  and  conditions  [S]  and  (8.3.2),  the  expression  on  the  left-hand  side 
converges  to  0  as  n  — >  oo.  Therefore,  if  cpZn ( t )  cp(t)  then  exp{J]  Ak(t)}  — >►  cp(t), 
and  vice  versa.  The  lemma  is  proved.  □ 


The  next  assertion  complements  Theorem  8.3.1. 
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Theorem  8.3.5  Assume  that  relations  (8.3.1)  and  (8.3.10)  hold.  Then  condition 
[Di]  (< or  condition  [Mi])  is  necessary  for  the  law  of  large  numbers. 

Proof  If  the  law  of  large  numbers  holds  then  cpZn  (t)  — >  1  and,  hence  by  Lemma  8.3.3 
(recall  that  (8.3.10)  implies  (8.3.2),  (8.3.5)  and  [S]) 


n  n 

J2  Ak(t )  =  y>(e"^'  -  1  -  ->  0. 

k=  1  k=  1 

Moreover,  by  Lemma  7.4.1 


k=  1 


n 

£=1 


<  £*,*)  <J2£ln  -  ™X£k,n  J2 

k= 1  £=1 


0. 


Therefore,  if  the  law  of  large  numbers  holds,  then  by  virtue  of  (8.3.10) 


-  1  -  it$Kn;  %k<n  >  £ky„)  ->  0. 

&=1 


Consider  the  function  a  (v)  =  (ez*  —  1) /ix.  It  is  not  hard  to  see  that  the  inequality 
|a(x)|  <  1  proved  in  Lemma  7.4.1  is  strict  for  v  >  s  >  0,  and  hence  there  exists  a 
S(T)  >  0  for  r  >  0  such  that  Re(l  —  a (x))  >  8 (r)  for  v  >  r.  This  is  equivalent  to 
Im(l  +  ix  —  elx)  >  8(r)x ,  so  that 


jc  < 


for  x  >  r. 


From  this  we  find  that 


n  n 

El(r)  =  ^^E(|^,nl,  =  ^  ^ >  t) 

fc=l  £=1 

1  U 

<  t—  Im^E(l  “  e^k,n^k,n  >  £*,*)  ->  0. 

Jfc=l 


Thus  condition  [Mi]  holds.  Together  with  relation  (8.3.2),  that  follows  from 
(8.3.10),  this  condition  implies  [Di].  The  theorem  is  proved.  □ 
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There  seem  to  exist  some  conditions  that  are  wider  than  (8.3.10)  and  under  which 

condition  [Di]  is  necessary  for  convergence  >  0  in  mean  (condition  (8.3.10)  is 
too  restrictive). 


8.4  The  Central  Limit  Theorem  for  Sums  of  Arbitrary 
Independent  Random  Variables 

As  in  Sect.  8.3,  we  consider  here  a  triangular  array  of  random  variables  §i §w,« 
and  their  sums 

n 

Kn  —  H k,n  •  (8.4.1) 

k=l 

We  will  assume  that  %k,n  have  finite  second  moments: 

<*k,n  :=  Var (&,*)  <  oo, 
and  suppose,  without  loss  of  generality,  that 

n 

m,n  =0,  J2  aln  =  Var^»)  =  1  •  (8.4.2) 

k=  1 

We  introduce  the  following  condition:  for  some  s  >  2, 

n 

D2:=^Enunfe2„,  1^19^0  asn^oo,  [D2] 

*=l 

which  is  to  play  an  important  role  in  what  follows.  Our  arguments  related  to  condi¬ 
tion  [D2]  and  also  to  conditions  [M2]  and  [L5]  to  be  introduced  below  will  be  quite 
similar  to  the  ones  from  Sect.  8.3  that  were  related  to  conditions  [Di],  [Mi]  and 
[L ,]. 

We  also  introduce  the  Lindeherg  condition :  for  any  r  >  0,  as  n  00, 

n 

M2O  :=  E(|§yt,n  I2;  \Hk,n I  >  r)  0.  [M2] 

k= 1 

The  following  assertion  is  an  analogue  of  Lemma  8.3.1. 

Lemma  8.4.1  1.  {[M2]  fl  (4.2)}  c  [D2].  2.  [D2]  C  [M2]. 

That  is,  conditions  [M2]  and  (8.4.2)  imply  [D2],  and  condition  [D2]  implies 
[M2]. 

From  Lemma  8.4.1  it  follows  that,  under  condition  (8.4.2),  conditions  [D2]  and 
[M2]  are  equivalent. 
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Proof  of  Lemma  8.4.1  1.  Let  conditions  [M2]  and  (8.4.2)  be  met.  Put 

g2(x)  :=  min(v2,  \x\s),  s  >  2. 

Then  (cf.  (8.3.3),  (8.3.4);  r  <  1) 


n 


n 


n 


-  E*  k,n  ’  I  ^  ”0  +E  e(i&,„i*;  l&,»l  <0 


£=1 


£=1 


£=1 


<  Af2(r)  +  2M2(0)  =  Af2(r)  +  r5  2. 


Since  r  is  arbitrary,  we  have  D2  — >  0  as  «  — >  00. 
2.  Conversely,  suppose  that  [D2]  holds.  Then 


rc  1  n  ^ 

M2(r)  <  y]Efe2„;  l&,nl  >0  +  ^2  E(l^.»l,;  T  <  l^.»l  <  1)  <  -732^2  0 


fc=l 


£=1 


for  any  r  >  0,  as  «  — >►  00.  The  lemma  is  proved. 


□ 


Lemma  8.4.1  also  implies  that  if  (8.4.2)  holds,  then  condition  [D2]  is  “invariant” 
with  respect  to  s  >  2. 

Condition  [D2]  can  be  stated  in  a  more  general  form: 

n 

^  0, 

k=l 

where  h(x)  is  any  function  for  which  h(x)  >  0  for  x  >  0,  h(x)  f ,  h(x)  — >  0  as 
v  — >►  0,  and  /i(r)^c<ooasr  ^  00.  All  the  key  properties  of  condition  [D2]  will 
then  be  preserved.  The  Lindeberg  condition  clarifies  the  meaning  of  condition  [D2] 
from  a  somewhat  different  point  of  view.  In  Lindeberg’s  condition,  h(x)  =  /(r?0 0), 
r  G  (0,  1).  A  similar  remark  may  be  made  with  regard  to  conditions  [Di]  and  [Mi] 
in  Sect.  8.3. 

In  a  way  similar  to  what  we  did  in  Sect.  8.3  when  discussing  condition  [Mi],  one 
can  easily  verify  that  condition  [M2]  implies  convergence  (see  (8.3.6)) 


max  Var(£fc>w)  ->  0 

k<n 


(8.4.3) 


and  the  negligibility  of  § k,n  (property  [S]).  Moreover,  one  obviously  has  the  inequal¬ 
ity 

Mi(t)<-M2(t). 

t 

For  a  given  fixed  (independent  of  n)  sequence  {^}  of  independent  random  vari¬ 
ables, 

00 

Sn  =  ^2  %k,  E&  =  ak,  Var(&)  =  a2, 

k= 1 


(8.4.4) 
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one  considers  the  asymptotic  behaviour  of  the  normed  sums 


oo 


B 


n 


=  E 

k= 1 


_2 
°k  ’ 


that  are  clearly  also  of  the  form  (8.4.1)  with  ^,n  =  (%k  ~  ak)/ Bn. 
Conditions  [Di]  and  [M2]  for  ^  will  take  the  form 


s  >  2; 


D2  = -  akf,  ^  fy  )  ->  0, 
nn  k=\  k  ^*ft  / 

j  OO 

M2(t)  =  ^2  XE(®^  “  -  flit  I  >  ->  o,  r  >  0. 


£2 

w  Jfc=l 


(8.4.5) 


(8.4.6) 


Theorem  8.4.1  (The  Central  Limit  Theorem)  If  the  sequences  of  random  vari¬ 
ables  w  =  1,2,...,  satisfy  conditions  (8.4.2)  and  [D2]  (or  [M2])  then,  as 

n  —>  00,  P  iKn  <  x)  — >  0(v)  uniformly  in  x . 

Proof  It  suffices  to  verify  that 


00 


n, 


it)  =  Y\n,n(t) 


k= 1 


By  Lemma  7.4.2, 


_  .-'2/2 


ft 


ft 


f]ft«w  -  rie 


*HV 2 


4=1 


ft 


£=1 


e  Xk»« 


AL/2 


fc=l 


ft 

^X 

&=1 


W,*(0  -  1  +  ^2^,ft 


ft 


+E 

4=1 


.-'2ffJtV2 


1  _l  1  .2  2 

1  +  2?  CT4,n 


(8.4.7) 


Since  by  Lemma  7.4.1,  for  s  <3, 


x 2 

1  —  1x4 - 

2 


<  min(x2,  - — 

-  V  6 


<  g2(x) 
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(see  the  definition  of  the  function  g2  in  the  beginning  of  the  proof  of  Lemma  8.4.1), 
the  first  sum  on  the  right-hand  side  of  (8.4.7)  does  not  exceed 


oo 


k= 1 


1 

1  —  it^k,n  +  -t 


oo  oo 

<  J2ES2(m,n\)  <  h(t)J2^82(\^k,n\)  <  h(t)D2  0, 
k=  1  k= 1 


where  h(t)  =  max(f2,  \t |3).  The  last  sum  in  (8.4.7)  (again  by  Lemma  7.4.1)  does 
not  exceed  (see  (8.4.2)  and  (8.4.3)) 


n 


-V 
8  ^ 


k=  l 


ak,n  <  g-  max  O' In 


n 


E 

k= 1 


ak,n  <  g- max 


0  as  n  — >  oo. 


The  theorem  is  proved. 


□ 


If  we  change  the  second  relation  in  (8.4.2)  to  Et;n  — a2  >  0,  then,  introducing 
the  new  random  variables  ^  =  ^k,n/ and  using  continuity  theorems,  it  is 

not  hard  to  obtain  from  Theorem  8.4.1  (see  e.g.  Lemma  6.2.2),  the  following  asser¬ 
tion,  which  sometimes  proves  to  be  more  useful  in  applications  than  Theorem  8.4.1. 


Corollary  8.4.1  Assume  that  E %k,n  =  0,  Var (fw)  —>  cr2  >  0,  and  condition  [D2]  (or 
[M2])  is  satisfied.  Then  t;n  €=>  4>o,o-2- 


Remark  8.4.1  A  sufficient  condition  for  [D2]  and  [M2]  is  provided  by  the  more  re¬ 
strictive  Lyapunov  condition ,  the  verification  of  which  is  sometimes  easier.  Assume 
that  (8.4.2)  holds.  For  s  >  2,  the  quantity 

n 

Ls  :=y]E|^,n|2 

k=  1 

is  called  the  Lyapunov  fraction  of  the  s-th  order.  The  condition 

Ls  ->  0  as  n  — >  00  [Ls] 

is  called  the  Lyapunov  condition. 


The  quantity  Ls  is  called  a  fraction  since  for  n  —  (Hk  ~  a) /Bn  (where  ak  =  E 
Bl  =  ELl  Var(^)  and  ^  do  not  depend  on  n),  it  has  the  form 
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If  the  are  identically  distributed,  a &  =  a ,  Var(^)  =  a2,  and  E|§£  —  =  fi  <  oo, 

then 

a 

i  — _ _ _ >  o 

5  asn(s-2)/2 

The  sufficiency  of  the  Lyapunov  condition  follows  from  the  obvious  inequalities 
g2(x)  <  |v|*  for  any  s,  D2<LS. 

In  the  case  of  (8.4.4)  and  (8.4.5)  we  can  give  a  sufficient  condition  for  the  in¬ 
tegral  limit  theorem  that  is  very  close  to  the  Lindeberg  condition  [M2];  the  former 
condition  elucidates  to  some  extent  the  essence  of  the  latter  (cf.  Theorem  8.3.3),  and 
in  many  cases  it  is  easier  to  verify.  Put  ~on  =  max£<w  o>.  Theorem  8.4.1  implies  the 
following  assertion  which  is  a  direct  extension  of  Theorem  8.2.1 


Theorem  8.4.2  Let  conditions  (8.4.4)  and  (8.4.5)  be  satisfied ,  the  sequence  of 
normalised  random  variables  §2/cr2  be  uniformly  integrable  and  ~on  =  o(Bn)  as 
n  —>  00.  Then  (=y  4>o,i- 


Proof  of  Theorem  8.4.2  repeats,  to  some  extent,  the  proof  of  Theorem  8.3.3.  For 
simplicity  assume  that  ak  =  0.  Then 


Efe2;  M 


& 


>  r 


(8.4.8) 


where  Bn/an  —>  00.  Hence,  it  follows  from  the  uniform  integrability  of  {-%}  that 

°k 

the  right-hand  side  of  (8.4.8)  is  o{q 2)  uniformly  in  k.  This  means  that 


1  n 

M2(t)  =  (£* ;  |&|  >  r Bn)  ->  0 


B2 

n  k= l 


as  n  oo  and  condition  (8.4.6)  (or  condition  [M2])  is  satisfied.  The  theorem  is 
proved.  □ 


Remark  8.4.2  We  can  generalise  the  assertion  of  Theorem  8.4.2  (cf.  Remark  8.3.3). 
In  particular,  convergence  4>o,  1  still  takes  place  if  a  finite  number  of  summands 
%k  (e-g->  for  k  <1,1  being  fixed)  are  completely  arbitrary,  and  the  sequence  ^  := 
%k+L  k  >  1,  satisfies  the  conditions  of  Theorem  8.4.2,  in  which  we  put  =  Var(§^), 
#2  =  Ylk= 1  ak'  and  it  is  aiso  assumed  that  —>  1  as  n  00. 

This  assertion  follows  from  the  fact  that 

Sn  _  Si  Sn  —  Si  Bn-1 
Bn  Bn  Bn-i  Bn 

where  -S — ^  0,  >  1  and,  by  Theorem  8.4.2,  s"~Sl  Oo  1  as  n  — >  00. 

Bn  Bn— l 
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Remark  8.4.3  The  uniform  integrability  condition  that  was  used  in  Theorem  8.4.2 
can  be  used  for  the  triangular  array  scheme  as  well.  In  this  more  general  case  the 
uniform  integrability  should  mean  the  following:  the  sequences  r\\,n,  •  •  •  ,hn,n,  n  = 
1,2,...,  in  the  triangular  array  scheme  are  uniformly  integrable  if  there  exists  a 
function  e(N)  |  0  as  N  f  oo  such  that,  for  all  n , 


maxE(|??;>|;  | r)jt 

j<n 


n 


> 


N)  <  s(N). 


It  is  not  hard  to  see  that,  with  such  an  interpretation  of  uniform  integrability, 
the  assertion  of  Theorem  8.4.2  holds  true  for  the  triangular  array  scheme  as  well 

provided  that  the  sequence  j 1}  is  uniformly  integrable  and  ma Xj<n  Oj,n  =  o(l)  as 


n 


oo. 


Gj,n 


Example  8.4.1  We  will  clarify  the  difference  between  the  Lindeberg  condition  and 
uniform  integrability  of  {-^}  in  the  following  example.  Let  rjk  be  independent 

Gk 

bounded  identically  distributed  random  variables,  E  rjk  =  0,  Drjk  =  1  and  g(k)  >  V2 
be  an  arbitrary  function.  Put 


rjk  with  probability  1  —  2g  2(k), 

±g(k)  with  probability  g~2(k). 

-2, 


Then  clearly  E§£  =  0,  :=  =  3  —  2 g  z(k)  g  (2,  3)  and  B~  e  (2 n,  3 n).  The 


-2 

;k 


uniform  integrability  of  {^},  or  the  uniform  integrability  of  {§?}  which  means  the 

°k 

same  in  our  case,  excludes  the  case  where  g(k)  —>  oo  as  k  —>  oo.  The  Lindeberg 
condition  is  wider  and  allows  the  growth  of  g(k ),  except  for  the  case  where  g(k)  > 
c\[k.  If  g(k)  =  o(VT),  then  the  Lindeberg  condition  is  satisfied  because,  for  any 
fixed  r  >  0, 


E(f2;|^|>rVq=0 


for  all  large  enough  k. 


Remark  8.4.4  Let  us  show  that  condition  [M2]  (or  [D2])  is  essential  for  the  central 
limit  theorem.  Consider  random  variables 


£ k,n  — 


zb  -^=  with  probability  ^ , 

a 

0  with  probability  1  —  | 


They  satisfy  conditions  (8.4.2),  [S],  but  not  the  Lindeberg  condition  as  M2(t)  =  1 
for  r  <  The  number  v %  of  non-zero  summands  converges  in  distribution  to 

a  random  variable  v  having  the  Poisson  distribution  with  parameter  2.  Therefore, 
will  clearly  converge  in  distribution  not  to  the  normal  law,  but  to  Y7j= 1  Yj>  where 
Yj  are  independent  and  take  values  ±1  with  probability  1/2. 
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Note  also  that  conditions  [D2]  or  [M2]  are  not  necessary  for  convergence  of 
the  distributions  of  fn  to  the  normal  distribution.  Indeed,  consider  the  following 
example:  €=  4>o,i,  §2 ,«  =  •  •  •  =  § n,n  =  0-  Conditions  (8.4.2)  are  clearly  met, 

P (£n  <  x)  =  @(x),  but  the  variables  are  not  negligible  and  therefore  do  not 
satisfy  conditions  [D2]  and  [M2]. 

If,  however,  as  well  as  convergence  fn  €=>  4>o,i  we  require  that  the  § k,n  are  neg¬ 
ligible,  then  conditions  [D2]  and  [M2]  become  necessary. 

Theorem  8.4.3  Suppose  that  the  sequences  of  independent  random  variables 
{%k,nYl= 1  satisfy  conditions  (8.4.2)  and  [S].  Then  condition  [Di]  (or  [M2])  is  neces¬ 
sary  and  sufficient  for  convergence  fn  ^=>-  4>o, 1  • 

First  note  that  the  assertions  of  Lemmas  8.3.2  and  8.3.3  remain  true,  up  to  some 
inessential  modifications,  if  we  substitute  conditions  (8.3.2)  and  [S]  with  (8.4.2) 
and  [S]. 


Lemma  8.4.2  Let  conditions  (8.4.2)  and  [S]  hold.  Then  (Ak(t)  =  cpk,n(t )  —  1) 


maxi  Ak(t) I  -*  0,  V|4(0 

k<n 


r 

“  2~’ 


and  the  assertion  of  Lemma  8.3.3,  that  the  convergence  (8.3.10)  is  necessary  and 
sufficient  for  convergence  (p$n(t)  —>  (p(t ),  remain  completely  true. 


Proof  We  can  retain  all  the  arguments  in  the  proofs  of  Lemmas  8.3.2  and  8.3.3 
except  for  one  place  where  \Ak(t)\  is  bounded.  Under  the  new  conditions,  by 
Lemma  7.4.1,  we  have 


Ak(t) 


n,n(t)  -  1  -  1  <  yE^;„, 


so  that 


y^i  Ak(t) 

No  other  changes  in  the  proofs  of  Lemmas  8.3.2  and  8.3.3  are  needed. 


r 

<  — 
“  2 


□ 


Proof  of  Theorem  8.4.3  Sufficiency  is  already  proved.  To  prove  necessity,  we  make 
use  of  Lemma  8.4.1.  If  cp^n  ( t )  — >►  e~r//2,  then  by  virtue  of  that  lemma,  for  Ak(t)  = 
<Pk,n(t)  -  1,  one  has 

n  t2 

1  n(p(t)  =  —z- 

k= 1 

For  t  =  1  the  above  relation  can  be  written  in  the  form 

Rn  :=  ltE(ei^n  ~  1  “  0.  (8.4.9) 
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Put  a(x)  :=  ( eix  —  1  —  ix) /x2.  It  is  not  hard  to  see  that  the  inequality  \ct(x)\  <  1/2 
proved  in  Lemma  7.4.1  is  strict  for  r /0,  and 

1 

sup  a(x)  < - <$(r), 

|JC|>T  ^ 


where  8(r)  >0  for  r  >  0.  This  means  that,  for  \x  \  >  r  >  0, 


Re 


1 

a(x)  +  - 


>  8(r)  >  0, 


v2  < 


1 


Re  I  elx  —  1  —  ix  + 


«(T) 


2  /’ 


1  /  \ 
Efe2„;  i&,»i  >  0  <  —  ReEL^'»  - 1  -  i&,„  +  -|M, 


and  hence  by  virtue  of  (8.4.9),  for  any  t  >  0, 


M2(t)<2-|r„|^0 

5(r) 


as  rc  — >  oo.  The  theorem  is  proved. 


□ 


Corollary  8.4.2  Assume  that  (8.4.2)  holds  and 


maxVar (&,„)  ->  0. 

k<n 


(8.4.10) 


a  necessary  and  sufficient  condition  for  convergence  Oq,  t  is  that 


n 


hn 


k=  1 


P 

(or  that  rjn  1). 

Proof  Let  r\n  €=>  Ip  The  random  variables  ^  n  =  ^2 n  —  o2n  satisfy,  by  virtue  of 
(8.4.10),  condition  (8.3.10)  and  satisfy  the  law  of  large  numbers: 

n 

£ k,n 

k= 1 

Therefore,  by  Theorem  8.3.5,  the  n 

EE(li-^ 

fc=l 

But  by  (8.4.10)  this  condition  is  clearly  equivalent  to  condition  [M2]  for  §£jW,  and 
hence  ^  fc0,i- 


p 


=  Y)n  -  1  ->  0. 


satisfy  condition  [Mi]:  for  any  r  >  0, 


£-2  _  2 
^ k,n 


>  r 


)-0. 


(8.4.11) 
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Conversely,  if  ^  4>o,i,  then  [M2]  holds  for  £ k,n  which  implies  (8.4.11).  Since, 
moreover. 


n 


n 


£EIC«N2EVar(^)  =  2’ 


k= 1 
•  / 


&=1 


relation  (8.3.2)  holds  for  ^  n,  and  by  Theorem  8.3.1 


The  corollary  is  proved. 


n 


i^o 


fc=l 


□ 


Example  8.4.2  Let  k  =  1,2, . . . ,  be  independent  random  variables  with  distribu¬ 
tions 

1 


P(&  =  *“)  =  P(&  =  -^)  =  ^ 


d 


Evidently,  ^  can  be  represented  as  ^  =karjk ,  where  rjk  =  rj  are  independent 


1 


Pfo=l)  =  Pfo  =  -l)  =  ^  Var(»j)  =  l,  o£  =  Var(^yt)  =  fc2“. 

Let  us  show  that,  for  all  a  >  — 1/2,  the  random  variables  Sn/Bn  are  asymptoti¬ 
cally  normal.  Since 

£2 

^  d  J1 

—  =  rl 

°k 

are  uniformly  integrable,  by  Theorem  8.4.2  it  suffices  to  verify  the  condition 


on  =  maxo>  =o(Bn ). 

k<n 


In  our  case  on  =  max(l,  n2")  and,  for  a  >  —1/2, 


E*2" 

fc=l 


x2ctda  = 


n2a+l 
2ol  4-  1 


For  a  =  -1/2,  one  has 

n 

Bn=Ek~l  ~lnn- 

k=  1 


Clearly,  in  these  cases  =o(Bn )  and  the  asymptotical  normality  of  Sw/w  holds. 
If  of  <  —1/2  then  the  sequence  converges,  condition  ~on  =  1  =  o(5w)  is  not 
satisfied  and  the  asymptotical  normality  of  Sn/Bn  fails  to  take  place. 
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Note  that,  for  a  =  — 1/2,  the  random  variable 


n 


E 


rjk 

\fk 


will  be  “comparable”  with  Vlr vn  with  a  high  probability,  while  the  sums 

A(-P* 

hx 


converge  to  a  constant. 

A  rather  graphical  and  well-known  illustration  of  the  above  theorems  is  the  scat¬ 
tering  of  shells  when  shooting  at  a  target.  The  fact  is  that  the  trajectory  of  a  shell  is 
influenced  by  a  large  number  of  independent  factors  of  which  the  individual  effects 
are  small.  These  are  deviations  in  the  amount  of  gun  powder,  in  the  weight  and  size 
of  a  shell,  variations  in  the  humidity  and  temperature  of  the  air,  wind  direction  and 
velocities  at  different  altitudes  and  so  on.  As  a  result,  the  deviation  of  a  shell  from 
the  aiming  point  is  described  by  the  normal  law  with  an  amazing  accuracy. 

Similar  observations  could  be  made  about  errors  in  measurements  when  their 
accuracy  is  affected  by  many  “small”  factors.  (There  even  exists  a  theory  of  errors 
of  which  the  crucial  element  is  the  central  limit  theorem.) 

On  the  whole,  the  central  limit  theorem  has  a  lot  of  applications  in  various  areas. 
This  is  due  to  its  universality  and  robustness  under  small  deviations  from  the  as¬ 
sumptions  of  the  theorem,  and  its  relatively  high  accuracy  even  for  moderate  values 
of  n.  The  first  two  noted  qualities  mean  that: 

(1)  the  theorem  is  applicable  to  variables  § k,n  with  any  distributions  so  long  as 
the  variances  of  exist  and  are  “negligible”; 

(2)  the  presence  of  a  “moderate”  dependence  between  § k,n  does  not  change  the 
normality  of  the  limiting  distribution. 

To  illustrate  the  accuracy  of  the  normal  approximation,  consider  the  following 
example.  Let  Fn(x)  =  P  (Sn/y/n  <  x )  be  the  distribution  function  of  the  normalised 
sum  Sn  of  independent  variables  ^  uniformly  distributed  over  [—y/3,  V3],  so  that 
Var(£fc)  =  1.  Then  it  turns  out  that  already  for  n  =  5  (!)  the  maximum  of  \Fn(x)  — 
<P(x)  |  over  the  whole  axis  of  v -values  does  not  exceed  0.006  (the  maximum  is 
attained  near  the  points  v  =  ±0.7). 

And  still,  despite  the  above  circumstances,  one  has  to  be  careful  when  applying 
the  central  limit  theorem.  For  instance,  one  cannot  expect  high  accuracy  from  the 
normal  approximation  when  estimating  probabilities  of  rare  events,  say  when  study¬ 
ing  large  deviation  probabilities  (this  issue  has  already  been  discussed  in  Sect.  5.3). 


3There  exist  several  conditions  characterising  admissible  dependence  of  Such  considerations 
are  beyond  the  scope  of  the  present  book,  but  can  be  found  in  the  special  literature.  See  e.g.  [20]. 
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After  all,  the  theorem  only  ensures  the  smallness  of  the  difference 


|0(x)-P(C  <x) 


(8.4.12) 


for  large  n.  Suppose  we  want  to  use  the  normal  approximation  to  find  an  vo  such 
that  the  event  {£n  >  vo}  would  occur  on  average  once  in  1000  trials  (a  problem 
of  this  sort  could  be  encountered  by  an  experimenter  who  wants  to  ensure  that,  in 
a  single  experiment,  such  an  event  will  not  occur).  Even  if  the  difference  (8.4.12) 
does  not  exceed  0.02  (which  can  be  a  good  approximation)  then,  using  the  normal 
approximation,  we  risk  making  a  serious  error.  It  can  turn  out,  say,  that  l  —  &(xo)  = 
10-3  while  P(£  <  v)  ^  0.02,  and  then  the  event  {%n  >  vo)  will  occur  much  more 
often  (on  average,  once  in  each  50  trials). 

In  Chap.  9  we  will  consider  the  problem  of  large  deviation  probabilities  that 
enables  one  to  handle  such  situations.  In  that  case  one  looks  for  a  function  P(n,x) 
such  that  P(f  <  x)/P(n,  x)  — >  1  as  n  — >  oo,  v  ->  oo.  The  function  P(n,x)  turns 
out  to  be,  generally  speaking,  different  from  1  —  @(x).  We  should  note  however  that 
using  the  approximation  P(n,x)  requires  more  restrictive  conditions  on  {t;k,n}- 
In  Sect.  8.7  we  will  consider  the  so-called  integro-local  and  local  limit  theorems 
that  establish  convergence  of  the  density  of  to  that  of  the  normal  law  and  enables 
one  to  estimate  probabilities  of  rare  events  of  another  sort — say,  of  the  form  {a  < 
<  b]  where  a  and  b  are  close  to  each  other. 


8.5  Another  Approach  to  Proving  Limit  Theorems.  Estimating 
Approximation  Rates 

The  approach  to  proving  the  principal  limit  theorems  for  the  distributions  of  sums  of 
random  variables  that  we  considered  in  Sects.  8. 1-8.4  was  based  on  the  use  of  ch.f.s. 
However,  this  is  by  far  not  the  only  method  of  proof  of  such  assertions.  Nowadays 
there  exist  several  rather  simple  proofs  of  both  the  laws  of  large  numbers  and  the 
central  limit  theorem  that  do  not  use  the  apparatus  of  ch.f.s.  (This,  however,  does  not 
belittle  that  powerful,  well-developed,  and  rather  universal  tool.)  Moreover,  these 
proofs  sometimes  enable  one  to  obtain  more  general  results.  As  an  illustration,  we 
will  give  below  a  proof  of  the  central  limit  theorem  that  extends,  in  a  certain  sense, 
Theorems  8.4.1  and  8.4.3  and  provides  an  estimate  of  the  convergence  rate  (although 
not  the  best  one). 

Along  with  the  random  variables  § k,n  in  the  triangular  array  scheme  under  as¬ 
sumption  (8.4.2),  consider  mutually  independent  and  independent  of  the  sequence 
{&,»}£=  i  random  variables  rjk,n  ^  $0,<x2  >  ak,n  ■=  Var (&,„),  so  that 

9  k .  yi 


n 


k= i 
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Set4 

Bk,n  •=  E|£fcjW|  ,  Vk,n  •=  E\r}k,n\  —^3 &k,n  —  C?>Pk,ni 


J  \%\  |  d(Fk,n(x)  ^k^n  CO)  |  2:  Bk,n  T  > 


n  n  n 

L?,  :=  A^3  :=  ^2^k,n,  ^3  •=  ^/4U  <  L3  +  A3  <  (1  +  £3)^3- 

fc=i  fc=i  fc=i 

Here  and  &k,n  are  the  distribution  functions  of  %k,n  and  %jW,  respectively.  The 
quantities  L3  and  A3  are  the  third  order  Lyapunov  fractions  for  the  sequences  {§&jW} 
and  {rjk,n}-  The  quantities  /x^  n  are  called  the  third  order  pseudomoments  and  L® 
the  Lyapunov  fractions  for  pseudomoments.  Clearly,  A3  <  C3L3  ->  0,  provided  that 
the  Lyapunov  condition  holds.  As  we  have  already  noted,  for  ^,n  =  (fk  ~  ak)/ Bn, 
where  ak  =  E§£,  B%  =  Y^l\  Var(^),  and  ^  do  not  depend  on  n,  one  has 


dk  =  E|  %k  &k 


If,  moreover,  ^  are  identically  distributed,  then 


Our  first  task  here  is  to  estimate  the  closeness  of  E /(£„)  to  E  f(rjn)  for  suffi¬ 
ciently  smooth  /.  This  problem  could  be  of  independent  interest.  Assume  that  / 
belongs  to  the  class  C3  of  all  bounded  functions  with  uniformly  continuous  and 
bounded  third  derivatives:  supr  |/(3)(v)|  <  f^. 


Theorem  8.5.1  If  f  eC$  then 


Ef(Sn)-Ef(rin) 


f3L°  f3 

<^<#(^3+^3). 

6  6 


Proof  Put,  for  1  <1  <n, 

Xl  :=  §1  ,n  +  •  •  •  +  §Z-1,H  +  hl,n  +  •  •  •  +  %,n, 

Z/  :=  §1  ,n  +  •  •  •  +  §/-l,«  +  hl+l^n  +  •  •  •  +  hn,n, 

X\  :=  77^,  =  £ n . 


(8.5.1) 


Then 


Z/+1  =  Z/  +  §/>w ,  Xi  =  Zi  +  Tf)l,n » 


4If  ??  €=  4>o,i  then  C3  =E|^|3  =  -2=  ff°  x3e  x2^dx  — 


4 


V2T  J0 


/0°° 


(8.5.2) 


8.5  Another  Approach  to  Proving  Limit  Theorems 


211 


n 


/(?»)  -  firm)  =  I][/(X/+i)  -  /(X,)] 


(8.5.3) 


1  =  1 


Now  we  will  make  use  of  the  following  lemma. 


Lemma  8.5.1  Let  f  e  C3  and  Z,  §  and  77  independent  random  variables  with 


E§  =  E77  =  a, 


Then 


Ef2=E  r,2  =  o2, 


/x°  =  J  |x3| (x)  —  Fjjix)) |  < 


00. 


E/(Z  +  $)-E/(Z  +  ij) 


< 


hT 


0 


(8.5.4) 


Applying  this  lemma  to  (8.5.3),  we  get 


|E[/(X;+1)  -  /(X0]|  < 


/3M 


0 


which  after  summation  gives  (8.5.1).  The  theorem  is  proved. 


□ 


Thus  to  complete  the  argument  proving  Theorem  8.5.1  it  remains  to  prove 
Lemma  8.5.1. 


Proof  of  Lemma  8.5.1  Set  g(v)  :=  E/(Z  +  x).  It  is  evident  that  g,  being  the  result 
of  the  averaging  of  /,  has  all  the  smoothness  properties  of  /  and,  in  particular, 
\g"'(x)\  <  f 3.  By  virtue  of  the  independence  of  Z,  §  and  77,  we  have 

E/ (Z  +  $)  —  Ef  (Z  +  r,)  =  J  gix)d(F^x)  -  F„(x)).  (8.5.5) 

For  the  integrand,  we  make  use  of  the  expansion 


2  3 

x ^  xJ 

!  /n\  ,  //  /n\  ,  ///, 


g(x)  =  g(0)  +  xg'(0)  +  —  g"(0)  +  —g"'{6x),  0,  e  [0,  x]. 

2  6 

Since  the  first  and  second  moments  of  §  coincide  with  those  of  77,  we  obtain  for  the 
right-hand  side  of  (8.5.5)  the  bound 


1 


-x3g'"(dx)d(Fj:(x)  -  Fr,(x)) 


< 


hd 


0 


The  lemma  is  proved. 


□ 


Remark  8.5.1  In  exactly  the  same  way  one  can  establish  the  representation 


*'"(0) 


< 


n 


EEfe3.«-^n)  + 


0 


k= 1 


U  L 


24  ’ 


E/ (£w)  —  E/ (rjn) 


6 


(8.5.6) 
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under  obvious  conventions  for  the  notations  f\  and  This  bound  can  improve 
upon  (8.5.1)  if  the  differences  E(^3/?  —  rj\n)  are  small.  If,  for  instance,  ^k,n  = 
(ijk  —  a)/(cfy/ii),  are  identically  distributed,  and  the  third  moments  of  and 
Tjk,n  coincide,  then  on  the  right-hand  side  of  (8.5.6)  we  will  have  a  quantity  of  the 
order  l/n. 


Theorem  8.5.1  extends  Theorem  8.4.1  in  the  case  when  s  =  3.  The  extension 
is  that,  to  establish  convergence  t,n  €=>  $o,  i ,  one  no  longer  needs  the  negligibility 
of  %k,n-  If,  for  example,  ^  $0,1/2  (in  that  case  /x^  =  0)  and  L 3  — >  0,  then 

E/ (fw)  — >  E/ (77),  77  €=  $0,1  ,  for  any  /  from  the  class  C3 .  Since  C 3  is  a  distribution 
determining  class  (see  Chap.  6),  it  remains  to  make  use  of  Corollary  6.3.2. 

We  can  strengthen  the  above  assertion. 


Theorem  8.5.2  For  any  xgI, 


P <&n  <  X) 


0(x)\<c(L") 


0\l/4 


where  c  is  an  absolute  constant . 


(8.5.7) 


Proof  Take  an  arbitrary  function  h  e  C3,  0  <  h  <  1,  such  that  h(x)  =  1  for  v  <  0 
and  h(x)  =  0  for  v  >  1,  and  put  ^3  =  supx  \h'"(x)\.  Then,  for  the  function  f(x)  = 
h((x  —  t)/s ),  we  will  have  =  sup^.  \f'"(x)\  <  h^/s3,  and  by  Theorem  8.5.1 

P(?»  <  0  <  E/(f„)  <  E/(ij)  + 

6 

ft3L°  e  £3Z° 

<  P(ij  <  r  +  e)  +  — r-  <  P(??  <  t)  +  —;=  +  -7-5-. 

6s-3  V2^  6e 

The  last  inequality  holds  since  the  maximum  of  the  derivative  of  the  normal  distri¬ 
bution  function  <P{t)  =  P(^  <  t)  is  equal  to  l/\/2i r.  Establishing  in  the  same  way 
the  converse  inequality  and  putting  e  =  (L3)1/4,  we  arrive  at  (8.5.7).  The  theorem 
is  proved.  □ 


The  bound  in  Theorem  8.5.2  is,  of  course,  not  the  best  one.  And  yet  inequality 
(8.5.7)  shows  that  we  will  have  a  good  normal  approximation  for  P(fn  <  x)  in  the 
large  deviations  range  (i.e.  for  \x  \  ->  00)  as  well — at  least  for  those  v  for  which 

(1 -0(|x|))(L^)-1/4^  00  (8.5.8) 

as  n  — >  00.  Indeed,  in  that  case,  say,  for  jc  =  |jc|  >  0, 

Pfo  >  x) 

\-0(x) 


< 


c(L"y/* 

\-<P(x) 


0. 
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Since  by  L’ Hospital’s  rule 


l-<p(x)  = 


as  x 


00, 


(8.5.8)  holds  for  \x\  <  c\J—\nL®  with  an  appropriately  chosen  constant  c\. 

In  Chap.  20  we  will  obtain  an  extension  of  Theorems  8.5.1  and  8.5.2. 

The  problem  of  refinements  and  approximation  rate  bounds  in  the  central  limit 
theorem  and  other  limit  theorems  is  one  of  the  most  important  in  probability  theory, 
because  solving  it  will  tell  us  how  precise  and  efficient  the  applications  of  these 
theorems  to  practical  problems  will  be.  First  of  all,  one  has  to  find  the  true  order  of 
the  decay  of 

An  =  SUp  P(£„  <  X)  ~  <P(X) 

x 


in  n  (or,  say,  in  L 3  in  the  case  of  non-identically  distributed  variables).  There  ex¬ 
ist  at  least  two  approaches  to  finding  sharp  bounds  for  An.  The  first  one,  the  so- 
called  method  of  characteristic  functions ,  is  based  on  the  unimprovable  bound  for 
the  closeness  of  the  ch.f.s 


In  <M0  + 


2 


<  cL3 


that  the  reader  can  obtain  by  him/herself,  using  Lemma  7.4. 1  and  somewhat  modify¬ 
ing  the  argument  in  the  proof  of  Theorem  8.4.1.  The  principal  technical  difficulties 
here  are  in  deriving,  using  the  inversion  formula,  the  same  order  of  smallness  for  An . 

The  second  approach,  the  so-called  method  of  compositions,  has  been  illustrated 
in  the  present  section  in  Theorem  8.5.1  (the  idea  of  the  method  is  expressed,  to  a 
certain  extent,  by  relation  (8.5.3)).  It  will  be  using  just  that  method  that  we  will 
prove  in  Appendix  5  the  following  general  result  (Cramer-Berry-Esseen): 


Theorem  8.5.3  If^k,n  =  (fk  ~  dk)/Bn ,  where  do  not  depend  on  n,  then 


sup  P (£n  <x)-0(x)  <  cL3, 


where  c  is  an  absolute  constant. 


In  the  case  of  identically  distributed  ^  the  right-hand  side  of  the  above  inequality 
becomes  cp,\/(cr3^/n).  It  was  established  that  in  this  case  (27r)-1/2  <  c  <  0A114 , 
while  in  the  case  of  non-identically  distributed  summands  c  <  0.5591. 

One  should  keep  in  mind  that  the  above  theorems  and  the  bounds  for  the  constant 
c  are  universal  and  therefore  hold  under  the  most  unfavourable  conditions  (from 
the  point  of  view  of  the  approximation).  In  real  problems,  the  convergence  rate  is 
usually  much  better. 


5 See  [33]. 
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8.6  The  Law  of  Large  Numbers  and  the  Central  Limit  Theorem 
in  the  Multivariate  Case 

In  this  section  we  assume  that  ...  ,  £w>w  are  random  vectors  in  the  triangular 
array  scheme, 

n 

E£; k,n  —  0?  Kn  —  ^  '  % k,n  • 

k=  1 

P 

The  law  of  large  numbers  t,n  ->  0  follows  immediately  from  Theorem  8.3.1,  if 
we  assume  that  the  components  of  %k,n  satisfy  the  conditions  of  that  theorem.  Thus 
we  can  assume  that  Theorem  8.3.1  was  formulated  and  proved  for  vectors. 

Dealing  with  the  central  limit  theorem  is  somewhat  more  complicated.  Here  we 
will  assume  that  E|^,»  I2  <  oo,  where  \x\ 2  =  (x,x)  is  square  of  the  norm  of  v.  Let 

n 

aln  ■=  E$ln&.n,  °n  ■= 

k=  1 

(the  superscript  T  denotes  transposition,  so  that  ^ n  is  a  column  vector). 

Introduce  the  condition 

n 

y^Emin(|^  „|2,  \%k,n\s)  0,  s  >  2,  [D2] 

k= 1 

and  the  Lindeberg  condition 

n 

£e(|&,ii|2;|&,b|>t)->0  [M2] 

k=  1 

as  n  —>  oo  for  any  r  >  0.  As  in  the  univariate  case,  we  can  easily  verify  that  condi¬ 
tions  [D2]  and  [M2]  are  equivalent  provided  that  tr<j“  :=  Y^Cj=\  (an)jj  <  c  <  00. 

Theorem  8.6.1  If  a2  »  where  cr2  is  a  positive  definite  matrix ,  and  condition 

[D2]  (or  [M2])  is  met ,  then 


Kn  O0?a2. 


Corollary  8.6.1  (“The  conventional”  central  limit  theorem)  If  §i,§2,  •••  is  a  se¬ 
quence  of  independent  identically  distributed  random  vectors ,  =  0,  o2  = 

E^r £*  and  &  then ,  as  n oo. 


$ 


0,(7 


2. 
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This  assertion  is  a  consequence  of  Theorem  8.6.1,  since  the  random  variables 
%k,n  =  satisfy  its  conditions. 


Proof  of  Theorem  8.6.1  Consider  the  characteristic  functions 


n 

<Pk.n(t)  :=  cpn{t)  :=  Eei{t’^  =  f]  <%,„(/). 

k=  1 


In  order  to  prove  the  theorem  we  have  to  verify  that,  for  any  t ,  as  n  ->  oo, 


^/i(0  exp 


1  9  T- 

-~tazt1 


We  make  use  of  Theorem  8.4.1.  We  can  interpret  (pk,n(t)  and  <^(0  as  the  ch.f.s 

¥>ln(0  =  E  exp(iu^>n)-  rfW  =  EexpO'i’^) 

of  the  random  variables  =  (§ k,n ,  0),  =  (fw,  0),  where  0  =  t/\t\,  v  =  \t\.  Let 

us  show  that  the  scalar  random  variables  n  satisfy  the  conditions  of  Theorem  8.4. 1 
(or  Corollary  8.4.1)  for  the  univariate  case.  Clearly, 


n 


E  £n=0. 


EE(C02 = X!e«m.o2 =e°y  ->  >  o. 


£=1 


£=1 


That  condition  [D2]  is  satisfied  follows  from  the  obvious  inequalities 


n 


n 


G k.n,e )2  =  |^,„f  <  ^  EE«2(l?Ml). 


fc=l 


fc=l 


where  g20O  =  min(v2,  \x  |5),  s  >  2.  Thus,  for  any  v  and  0  (i.e.,  for  any  f),  by  Corol¬ 
lary  8.4.1  of  Theorem  8.4.1 


(pn(t)  =  Eexp{/i;^}  ->  exp 


li;20a20r 

2 


=  exp 


1  97- 

- tcrzt1 


The  theorem  is  proved. 


□ 


Theorem  8.6.1  does  not  cover  the  case  where  the  entries  of  the  matrix  a2  grow 
unboundedly  or  behave  in  such  away  that  the  rank  of  the  limiting  matrix  o2  becomes 
less  than  the  dimension  of  the  vectors  This  can  happen  when  the  variances  of 
different  components  of  ^k,n  have  different  orders  of  decay  (or  growth).  In  such  a 
case,  one  should  consider  the  transformed  sums  instead  of  fw.  Theo¬ 

rem  8.6.1  is  actually  a  consequence  of  the  following  more  general  assertion  which, 
in  turn,  follows  from  Theorem  8.6.1. 


Theorem  8.6.2  If  the  random  variables  ^  n  =  ^k^n  1  sa^sfy  condition  [D2]  (or 
[M2])  then  t,'n  €=>-  4>o,l>  where  E  is  the  identity  matrix. 
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8.7  Integro-Local  and  Local  Limit  Theorems  for  Sums  of 
Identically  Distributed  Random  Variables  with  Finite 
Variance 

Theorem  8.2.1  from  Sect.  8.2  is  called  the  integral  limit  theorem.  To  understand 
the  reasons  for  using  such  a  name,  one  should  compare  this  assertion  with  (more 
accurate)  limit  theorems  of  another  type,  that  describe  the  asymptotic  behaviour  of 
the  densities  of  the  distributions  of  Sn  (if  any)  or  the  asymptotics  of  the  probabilities 
of  sums  Sn  hitting  a  fixed  interval.  It  is  natural  to  call  the  theorems  for  densities  local 
theorems.  Theorems  similar  to  Theorem  8.2.1  can  be  obtained  from  the  local  ones 
(if  the  densities  exist)  by  integrating,  and  it  is  natural  to  call  them  integral  theorems. 
Assertions  about  the  asymptotics  of  the  probabilities  of  Sn  hitting  an  interval  are 
“intermediate”  between  the  local  and  integral  theorems,  and  it  is  natural  to  call  them 
integro-local  theorems.  In  the  literature,  such  statements  are  often  also  referred  to 
as  local ,  apparently  because  they  describe  the  probability  of  the  localisation  of  the 
sum  Sn  in  a  given  interval. 


8.7.1  Integro-Local  Theorems 

Integro-local  theorems  describe  the  asymptotics  of 

£  \x,  x  +  A)) 

as  n  oo  for  a  fixed  A  >  0.  Probabilities  of  this  type  for  increasing  A  (or  for 
A  =  oo )  can  clearly  be  obtained  by  summing  the  corresponding  probabilities  for 
fixed  A. 

We  will  derive  integro-local  and  local  theorems  with  the  inversion  formulas  from 
Sect.  8.7.2. 

For  the  sake  of  brevity,  put 


A[x)  =  [x,  x  +  A) 

and  denote  by  0  (v)  =  0 o,  l  OO  the  density  of  the  standard  normal  distribution.  Below 
we  will  restrict  ourselves  to  the  investigation  of  the  sums  Sn  =  % i  +  •  •  •  +  of 

independent  identically  distributed  random  variables  §£  = 

Theorem  8.7.1  (The  Stone-Shepp  integro-local  theorem)  Let  §  he  a  non-lattice 
random  variable ,  =0  and  E§2  =  cr2  <  oo.  Then ,  for  any  fixed  A  >  0,  as 

n  —>  oo, 

p (Sn  e  A[x))  =  +o(-L\ 

o^/n  \o *Jn  )  \\/n  ) 

where  the  remainder  term  o(  1  / ^fn)  is  uniform  in  x. 


(8.7.1) 
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Remark  8.7.1  Since  relation  (8.7.1)  is  valid  for  any  fixed  A,  it  will  also  be  valid 
when  A  =  An  ^  0  slowly  enough  as  n  — >►  oo.  If  A  =  An  grows  then  the  asymp¬ 
totics  of  P (Sn  e  A[x))  can  be  obtained  by  summing  the  right-hand  sides  of  (8.7.1) 
for,  say,  A  =  1  (if  An  ->  oo  is  integer- valued).  Thus  the  integral  theorem  follows 
from  the  integro-local  one  but  not  vice  versa. 

Remark  8.7.2  By  virtue  of  the  properties  of  densities  (see  Sect.  3.2),  the  right-hand 
side  of  representation  (8.7.1)  has  the  same  form  as  if  the  random  variable  t,n  = 
Sn / (cr  +Jn)  had  the  density  f(v)  +  o(  1),  although  the  existence  of  the  density  of  Sn 
(or  fw)  is  not  assumed  in  the  theorem. 


Proof  of  Theorem  8.7.1  First  prove  the  theorem  under  the  simplifying  assumption 
that  condition 


limsup  (pit )  <  1 


Id- 


•oo 


(8.7.2) 


is  satisfied  (the  Cramer  condition  on  the  ch.f.).  Property  1 1  of  ch.f.s  (see  Sect.  8.7.1) 
implies  that  this  condition  is  always  met  if  the  distribution  of  the  sum  Sm ,  for  some 
m  >  1,  has  a  positive  absolutely  continuous  component.  The  proof  of  Theorem  8.7.1 
in  its  general  form  is  more  complicated  and  will  be  given  at  the  end  of  this  section, 
in  Sect.  8.7.3. 

In  order  to  use  the  inversion  formula  (7.2.8),  we  employ  the  “smoothing  method” 
and  consider,  along  with  Sn ,  the  sums 


Zr  —  8n  r/§ , 


(8.7.3) 


where  rjs  ^  U_s,o-  Since  the  ch.f.  cpr]8  ( t )  of  the  random  variable  rjs,  being  equal  to 


<Pns(t) 


1  —  e  i,s 
it  8 


(8.7.4) 


possesses  the  property  that  the  function  c pr]s(t)/t  is  integrable  at  infinity,  for  the 
increments  of  the  distribution  function  Gn(x )  of  the  random  variable  Zn  (its  ch.f. 
divided  by  t  is  integrable,  too)  we  can  use  formula  (7.2.8): 


1  C  1  —  g  it  A 

Gn(x  +  A)  -  Gn(x)  =P(Zn  e  A[x))  =  —  j  e~ltx - - -  cpn {t)cpm(t) dt 

_  A 
2n 

where  (pit)  =  (pr]8  ( t)cpr]A  (t )  (cf.  (7.2.8))  is  the  ch.f.  of  the  sum  of  independent  random 
variables  r/s  and  We  obtain  that  the  difference  Gn(x  +  A)  —  Gn(x ),  up  to  the 
factor  A,  is  nothing  else  but  the  value  of  the  density  of  the  random  variable  Sn  + 
rj §  +  tja  at  the  point  v. 

Split  the  integral  on  the  right-hand  side  of  (8.7.5)  into  the  two  subintegrals:  one 
over  the  domain  \t\  <  y  for  some  y  <  1,  and  the  other — over  the  complementary 


J  e~i,x<pn(t)(p(t)dt ,  (8.7.5) 
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domain.  Put  v  =  v^/n  and  consider  first 


I\  :=  f  e  ltv^(pn  (t)cp(t)  dt 
J\t\<y 


du. 


Without  loss  of  generality  we  can  assume  a  =  1,  and  by  (8.2.1)  obtain  that 


1  -  (p{t)  =  X— 

t2 

lrup(t)  =  ln[l  —  (l  —  (p{t)y\  = - 1-  6>(f2)  as  t  — >  0.  (8.7.6) 


Hence 


+  hn(u ), 


(8.7.7) 


where  /zn(w)  — >  0  for  any  fixed  m  as  /t  ^  oo.  Moreover,  for  y  small  enough,  in  the 
domain  \u\  <  y  ^fn  we  have 


hn(u) 


< 


u 


2 


so  the  right-hand  side  of  (8.7.7)  does  not  exceed  —u2/ 3.  Now  we  can  rewrite  I\  in 
the  form 


h 


+  hn(u) 


(8.7.8) 


where  \(p(u/ *Jn)\  <  1  and  cp{u/*Jn)  — >►  1  for  any  fixed  u  as  /i  ->  oo.  Therefore,  by 
virtue  of  the  dominated  convergence  theorem, 


(8.7.9) 


uniformly  in  v,  since  the  integral  on  the  right-hand  side  of  (8.7.8)  is  uniformly  con¬ 
tinuous  in  v.  But  the  integral  on  the  right-hand  side  of  (8.7.9)  is  simply  (up  to  the 
factor  1  /(27t))  the  result  of  applying  the  inversion  formula  to  the  ch.f.  of  the  normal 
distribution,  so  that 


lim  -Jn  I\  =  Jin  e~v2 12 .  (8.7.10) 

n — >oq 

It  remains  to  consider  the  integral 

1 2  :=  f  e~ltVy^cpn  (t)cp(t)  dt. 

J\t\>y 
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By  virtue  of  (8.7.2)  and  non-latticeness  of  the  distribution  of  §, 


q  := 


sup 

I  t\>y 


(pit)  <  i 


(8.7.11) 


and  therefore 


\h\  <  q 


n 


f 

fit) 

'1  t\>Y 

dt  <  q'lc(A,  S ), 


lim  \fnI2  =  0  (8.7.12) 

n^oo 


uniformly  in  v ,  where  c(A,  8)  depends  on  A  and  8  only.  We  have  established  that, 
for  v  =  v  y/n,  as  n  — >  00,  the  relations 


h  +  h  = 


P (Z„  e  A[x))  =  _^e-^2/(2»)  +o( -L ) 

V270Z  \vn/ 


(8.7.13) 


hold  uniformly  in  v  (see  (8.7.5)).  This  means  that  representation  (8.7.13)  holds  uni¬ 
formly  for  all  v . 

Further,  by  (8.7.3), 


\Zn  g  \x,  x  +  A  —  5)}  C  {Sn  G  2\[x)}  C  { Zn  g  [x  —  8,  x  +  Z\)}  (8.7.14) 

and,  so,  in  particular, 

P(S„  e  A[x))  <  T±le-(^-5)2/(2«)  +  J _L)  =  A±£e-^2 

V2jtn  W nJ  \f2nn 

By  (8.7.14)  an  analogous  converse  inequality  also  holds.  Since  8  is  arbitrary,  this 
is  possible  only  if 

P(S„  e  A[X))  =  -^Ae-xl/(2n)  +o( 2-).  (8.7.15) 

\j  2jt  yi  W nJ 

The  theorem  is  proved.  □ 


8.7.2  Local  Theorems 

If  the  distribution  of  Sn  has  a  density  than  we  can  obtain  local  theorems  on  the 
asymptotics  of  this  density. 

Theorem  8.7.2  Let  E§  =  0,  E§2  =  cr2  <00  and  suppose  there  exists  an  m  >  1 
such  that  at  least  one  of  the  following  three  conditions  is  met : 

(a)  the  distribution  of  Sm  has  a  bounded  density ; 
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(b)  the  distribution  of  Sm  has  a  density  from  L2; 

(c)  the  chf  (pin  (t  )  of  the  sum  Sm  is  integrable. 

Then,  for  n  >  m,  the  distribution  of  the  sum  Sn  has  density  fsn  (v)  for  which  the 
representation 

1 

fSn  (*)  =  /= -  eXP 

\J2ixno 

holds  uniformly  in  x  as  n  —>  00. 

Conditions  (a)-(c)  are  equivalent  to  each  other  ( possibly  with  different  values 
ofm ). 


X' 


2  ncr2 


-Co 


1 


(8.7.16) 


Proof  We  first  establish  the  equivalence  of  (a)-(c).  The  fact  that  a  bounded  density 
belongs  to  L2  was  proved  in  Sect.  7.2.3.  Conversely,  if  /  e  L2  then 


/(2)*(0 


< 


/ 

7 


f(u)f(t  —  u)  du 


/ 


/  I  / 

/  (u)  dux  If  (t  —  u)du 


1/2 


/ 


=  /  /2(w) dw  <  00. 


Hence  the  relationship  fsm  e  L2  implies  the  boundedness  of  fs2m ,  and  thus  (a)  and 
(b)  are  equivalent. 

If  (pm  is  integrable  then  by  Theorem  7.2.2  the  density  fsm  exists  and  is  bounded. 
Conversely,  if  fSm  is  bounded  then  fSm  e  L2,  (psm  e  L2  and  cps2m  e  L\  (see 
Sect.  8.7.2).  This  proves  the  equivalence  of  (a)  and  (c). 

We  will  now  prove  (8.7.16).  By  the  inversion  formula  (7.2.1), 

fsn  00  =  3-  J  e~ltx <pn (t)  dt . 


Here  the  integral  on  the  right-hand  side  does  not  “qualitatively”  differ  from  the 
integral  on  the  right-hand  side  of  (8.7.5),  we  only  have  to  put  <p(t)  =  1  in  the  part 
1 1  of  the  integral  (8.7.5)  (the  integral  over  the  set  \t\  <  y),  and,  in  the  part  I2  (over 
the  set  \t\  >  y),  to  replace  the  integrable  function  (p{t)  with  the  integrable  function 
cpm  (t)  and  to  replace  the  function  cpn  (t)  with  cpn~m  (t).  After  these  changes  the  whole 
argument  in  the  proof  of  relation  (8.7.13)  remains  valid,  and  therefore  the  same 
relation  (up  to  the  factor  A)  will  hold  for 


fs„(x)  = 


exp- 


2  no2 


The  theorem  is  proved. 


□ 


Theorem  8.7.2  implies  that  the  density  f$n  of  the  random  variable 
converges  to  the  density  0  of  the  standard  normal  law: 


uniformly  in  v  as  n 


00. 
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For  instance,  the  density  of  the  uniform  distribution  over  [—1,  1]  satisfies  the 
conditions  of  this  theorem,  and  hence  the  density  of  Sn  at  the  point  v  =  vcr^/n 
(a2  =  1  /3)  will  behave  as  — ]—  e~v  ^2o  ^  (cf.  the  remark  to  Example  3.6.1). 

o^/lnn 


In  the  arithmetic  case ,  where  the  random  variable  §  is  integer-valued  and  the 
greatest  common  divisor  of  all  possible  values  of  §  equals  1  (see  Sect.  7.1),  it  is  the 
asymptotics  of  the  probabilities  P (Sn  =x)  for  integer  v  that  become  the  subject  of 
interest  for  local  theorems.  In  this  case  we  cannot  assume  without  loss  of  generality 
that  E§  =  0. 


Theorem  8.7.3  (Gnedenko)  Let  E§  =  a,E§2  =  a2<oo  and  §  have  an  arithmetic 
distribution .  Then,  uniformly  over  all  integers  x,  as  n  ->  oo, 


P  {Sn  =X)  = 


1 


V2 


exp 


nncr 


(x  —  an)J 
Ino2 


+  o 


(8.7.17) 


Proof  When  proving  limit  theorems  for  arithmetic  §,  it  is  more  convenient  to  use 
the  generating  functions  (see  Sects.  7.1,  7.7) 

p(z)  =  Pi(z)  :=Ez?,  \z\  =  1, 

so  that  p(elt)  =  cp(t),  where  cp  is  the  ch.f.  of 

In  this  case  the  inversion  formulas  take  the  following  form  (see  (7.2.10)):  for 
integer  x. 


P($=x)  =  d—  f  z  x  lp(z)dz, 
zm  J\z\=\ 

P (S„=x)  =  d—f  z~x~l pn(z)dz=  f  f  e~ltx(pn{t)dt. 

2tc i  J |^|=i  2jt  J —j[ 

As  in  the  proof  of  Theorem  8.7.1,  here  we  split  the  integral  on  the  right-hand  side 
into  two  subintegrals:  over  the  domain  \t\  <  y  and  over  the  complementary  set.  The 
treatment  of  the  first  subintegral 

h  :=  f  e~itxcpn(t)  dt  =  [  e-ity\e-itacp(t)]n  dt 
J\t\<y  J \t\<y 

for  y  =  x  —  an  differs  from  the  considerations  for  I \  in  Theorem  8.7.1  only  in  that 
it  is  simpler  and  yields  (see  (8.7.10)) 


h 


exp< 


Itxo2 


Similarly,  the  treatment  of  the  second  subintegral  differs  from  that  of  1 2  in  Theo¬ 
rem  8.7.1  in  that  it  becomes  simpler,  since  the  range  of  integration  here  is  compact 
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and  on  that  one  has 


(pit)  <  q(y)  <  1 


(8.7.18) 


Therefore,  as  in  Theorem  8.7.1, 


1 


h  =  o 


The  theorem  is  proved. 


P  {Sn=x)  = 


1 


V2 


exp 


n  ncr 


r 


2  ncr 


+  O 


1 


□ 


Evidently,  for  the  values  of  y  of  order  Theorem  8.7.3  is  a  generalisation  of 
the  local  limit  theorem  for  the  Bernoulli  scheme  (see  Corollary  5.2.1). 


8.7.3  The  Proof  of  Theorem  8.7.1  in  the  General  Case 


To  prove  Theorem  8.7.1  in  the  general  case  we  will  use  the  same  approach  as  in 
Sect.  7.1.  We  will  again  employ  the  smoothing  method,  but  now,  when  specifying 
the  random  variable  Zn  in  (8.7.3),  we  will  take  Or)  instead  of  qs,  where  0  =  const, 
77  is  a  random  variable  with  the  ch.f.  from  Example  7.2.1  (see  the  end  of  Sect.  7.2) 
equal  to 

1-1*1,  1*1  <i; 

0,  \t\  >  1, 

so  that  for  Zn  =  Sn  +  Or 7,  similarly  to  (8.7.5),  we  have 


e  ,tx(pn{t)q>nA{t)(pev(t)dt, 


(8.7.19) 


where  (peri  it )  =  max(0,  1  —  0\t\).  As  in  Sect.  8.7.1,  split  the  integral  on  the  right- 
hand  side  of  (8.7.19)  into  two  subintegrals:  I\  over  the  domain  \t\  <  y  and  I2  over 
the  domain  y  <  \t\  <  I/O.  The  asymptotic  behaviour  of  these  integrals  is  investi¬ 
gated  in  almost  the  same  way  as  in  Sect.  8.7.1,  but  is  somewhat  simpler,  since  the 
domain  of  integration  in  I2  is  compact,  and  so,  by  the  non-latticeness  of  §,  one  has 
on  it  the  upper  bound 


q  :=  sup 

y<\t\<i/e 


(pit)  <  1. 


(8.7.20) 


Therefore,  to  bound  I2  we  no  longer  need  condition  (8.7.2). 

Thus  we  have  established,  as  above,  relation  (8.7.13). 

To  derive  from  this  fact  the  required  relation  (8.7.15)  we  will  need  the  following. 


Lemma  8.7.1  Let  f(y)  be  a  bounded  uniformly  continuous  function,  q  an  arbitrary 
proper  random  variable  independent  of  Sn  and  b(n)  —>  00  as  n  —>  00.  If  for  any 
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fixed  A>  0  and  0  >  0,  as  n  — >  oo,  we  have 


then 


P {Sn  +  Orj  6  2\[v)) 


+  o(l) 


P(V  e  A[x)) 


+  o(l) 


(8.7.21) 


(8.7.22) 


In  this  assertion  we  can  take  Sn  to  be  any  sequence  of  random  variables  satisfying 
(8.7.21).  In  this  section  we  will  set  b(h)  to  be  equal  to  y/n,  but  later  (see  the  proof 
of  Theorem  A7.2.1  in  Appendix  7)  we  will  need  some  other  sequences  as  well. 


Proof  Put  6  :=  S2A,  where  5  >  0  will  be  chosen  later,  A±:=(1±2S)A,  A±[x)  := 
[x,  x  +  A±)  and  /o  :=  max  f(y).  We  first  obtain  an  upper  bound  for  P (Sn  e  A[x)). 
We  have 


P (Z„  e  A+[x  -  45))  >  P (Z„  g  4+[x  -  AS);  \q\  <  1/5). 

On  the  event  \rj\  <  1/5  one  has  — 8 A  <  Or)  <  8 A ,  and  hence  on  this  event 

{ z„  €  4+[x  -  45)}  D  {5„  €  A[x)}. 

Thus,  by  independence  of  rj  and  Sn, 

P (z„  g  A+[x  -  AS))  >  P (Sn  g  4[x);  M  <  1/5)  =  P(S„  e  4[x))(l  -  6(5)), 

where  /z(5)  :=  P (\rj\  >  1/5)  — >►  0  as  5  — >  0.  By  condition  (8.7.21)  and  the  uniform 
integrability  of  /  we  obtain 


P (Sn  e  A[x))  <  P(Z„  G  4+[x  -  45))(l  -  h(S)) 

A 


-1 


< 


/ 


_Z?(ji)  \b(n) 


28Af0  /  1 

+  .  ,  .  +o 


b(n ) 


/?(n) 


(1-6(5))  * 


(8.7.23) 


If,  for  a  given  e  >  0,  we  choose  5  >  0  such  that 


(1-M5))  1<1  + 


£  A  £ 

T’ 


then  we  derive  from  (8.7.23)  that,  for  all  «  large  enough  and  s  small  enough, 


P(5„  g  4[x))  <  —  ( /(— 
'  '  -  b(n)  V  \6(«) 


+  £ 


(8.7.24) 
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This  implies,  in  particular,  that  for  all  x. 


A 


P (Sn  ^  2\[x))  <  ^  (/o  +  £). 

Now  we  will  obtain  a  lower  bound  for  P(SW  G  A[x)).  For  the  event 


(8.7.25) 


A  :  —  | Zn  g  _  [x  T  5 )  j 


we  have 


P(A)  =P(A;  \rj\  <  1/S)  +P(A;  \rj\  >  l/S). 
On  the  event  |  rj  \  <  1  / 8  we  have 


(8.7.26) 


{Zw  G  A—[x  +  2\5)}  C  { Sn  G  2\[x)}, 


and  hence 


P(A;M<l/5)<P(SneA[x)). 

Further,  by  independence  of  rj  and  Sn  and  inequality  (8.7.25), 

P(A;  I/7I  >  l/<5)  =  E  [P(A  |  rj);  \r]\  >  1/5] 

=  E[P(S„  g  A_[x  +  0rj  +  AS)  |  17);  \r]\  >  1/5] 
A 


(8.7.27) 


< 


Z?(/l) 


(/o  +  e)/i(«). 


Therefore,  combining  (8.7.26),  (8.7.27)  and  (8.7.21),  we  get 


p(s„ea[JC))>T^T//  x 


z?(ft)  Vz?(n) 


In  addition,  choosing  5  such  that 


28Af0  (  1 

- h  o 

b(n) 


A 


b(n)  J  b(n) 


(fo  +  s)h(S). 


foh(S)  <  25/0  < 


we  obtain  that,  for  all  n  large  enough  and  s  small  enough, 


P (Sn  e  A[x))  > -2-(f 


X 


b(n)  \  \b(ri) 


—  £ 


(8.7.28) 


Since  s  is  arbitrarily  small,  inequalities  (8.7.24)  and  (8.7.28)  prove  the  required 
relation  (8.7.22).  The  lemma  is  proved.  □ 

To  prove  the  theorem  it  remains  to  apply  Lemma  8.7.1  in  the  case  (see  (8.7.13)) 
where  /  =  0  and  b(n)  =  Theorem  8.7.1  is  proved.  □ 
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8.7.4  Uniform  Versions  of  Theorems  8.7. 1-8.7. 3  for  Random 
Variables  Depending  on  a  Parameter 

In  the  next  chapter,  we  will  need  uniform  versions  of  Theorems  8.7. 1-8. 7. 3,  where 
the  summands  depend  on  a  parameter  A.  Denote  such  summands  by  the 
corresponding  distributions  by  F  (^) ,  and  put 

n 

S{X)n  := 

k=  1 

where  are  independent  copies  of  F(^).  If  A  is  only  determined  by  the 

number  of  summands  n  then  we  will  be  dealing  with  the  triangular  array  scheme 
considered  in  Sects.  8. 3-8. 6  (the  summands  there  were  denoted  by  ^jW).  In  the 
general  case  we  will  take  the  segment  [0,  Ai]  for  some  X\  >  0  as  the  parametric  set, 
keeping  in  mind  that  A  e  [0,  Ai]  may  depend  on  n  (in  the  triangular  array  scheme 
one  can  put  A  =  1  /n). 

We  will  be  interested  in  what  conditions  must  be  imposed  on  a  family  of  dis¬ 
tributions  F(^)  for  the  assertions  of  Theorems  8.7. 1-8. 7. 3  to  hold  uniformly  in 
A  g  [0,  Ai].  We  introduce  the  following  notation: 

a(X)  =  Ef(A),  a2(X)  =  Var(f(A)),  <P(\){t)  =  Ee,r?«. 

The  next  assertion  is  an  analogue  of  Theorem  8.7.1. 

Theorem  8.7.1A  Let  the  distributions  F(^)  satisfy  the  following  properties :  0  < 
<j\  <  <r(A)  <  02  <  oo,  where  o\  and  02  do  not  depend  on  A: 

(a)  the  relation 

t2m2(X)  /  9 

-  1  -  ia(X)t  H - - - =  o(t2),  m2(A)  :=  E£(2X),  (8.7.29) 

holds  uniformly  mAe[0,Ai]^l^  0,  i.e.  there  exist  a  to  >  0  and  a  function 
s(t )  ->  0  as  t  — >  0,  independent  of  A,  such  that,  for  all  \t\  <  to,  the  absolute 
value  of  the  left-hand  side  of  (8.7.29)  does  not  exceed  s{t)t2\ 

(b)  for  any  fixed  0  <  0\  <  62  <  00, 

q (k)  :=  sup  \(p(k)(t)\  <  q  <  1,  (8.7.30) 

Ol<\t\<02 

where  q  does  not  depend  on  A. 

Then,  for  each  fixed  A  >  0, 

P(S(x)n  ~  na(X)  e  A[x))  =  f  <j>(  *  )  +  of -L ) ,  (8.7.31) 

<j(k)^/n  \a(A  )y/nj  \vW 

where  the  remainder  term  o(l/y/n)  is  uniform  in  x  and  A  e  [0,  Ai]. 
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Proof  Going  through  the  proof  of  Theorem  8.7.1  in  its  general  form  (see  Sect.  7.3), 
we  see  that,  to  ensure  the  validity  of  all  the  proofs  of  the  intermediate  assertions  in 
their  uniform  forms,  it  suffices  to  have  uniformity  in  the  following  two  places: 

(a)  the  uniformity  in  A  of  the  estimate  o(t2)  as  t  — >►  0  in  relation  (8.7.6)  for  the 

expansion  of  the  ch.f.  of  the  random  variable  §  =  ^0  )G  ; 

(b)  the  uniformity  in  relation  (8.7.20)  for  the  same  ch.f. 

We  verify  the  uniformity  in  (8.7.6).  For  cp(t)  =  Ee*^,  we  have  by  (8.7.29) 


\rup(t)  = 


ita{X) 

^r+ln«» 


t2(m2(X)  —  a2(X)) 
2  o2(X) 


where  the  remainder  term  is  uniform  in  X. 

The  uniformity  in  relation  (8.7.20)  clearly  follows  from  condition  b),  since  a  (A) 
is  uniformly  separated  from  both  0  and  oo.  The  theorem  is  proved.  □ 


Remark  8.7.3  Conditions  (a)  and  (b)  of  Theorem  8.7. 1A  are  essential  for  (8.7.31) 
to  hold.  To  see  this,  consider  random  variables  §  and  r)  with  fixed  distributions, 
E§  =  Eij  =  0  and  E§2  =  Er/2  =  1.  Let  A  e  [0,  1]  and  the  random  variable  §(x)  be 
defined  by 


with  probability  1  —  A , 
with  probability  A , 


(8.7.32) 


so  that  E§(X)  =0  and  Var(§(^))  =  2  —  A  (in  the  case  of  the  triangular  array  scheme 
one  can  put  A  =  1  /n).  Then,  under  the  obvious  notational  conventions,  for  A  =  t2, 
f  >  0,  we  have 


<P(k)(t)  =  (1  -  k)<p$(t)  +X<pr, 


+  o(t 2)  +  t2(pri(  1). 


This  implies  that  (8.7.29)  does  not  hold  and  hence  condition  a)  is  not  met  for  the 
values  of  A  in  the  vicinity  of  zero.  At  the  same  time,  the  uniform  versions  of  relation 
(8.7.31)  and  the  central  limit  theorem  will  fail  to  hold.  Indeed,  putting  A  =  1  /n,  we 
obtain  the  triangular  array  scheme,  in  which  the  number  vn  of  the  summands  of  the 
form  rji  / \/~X  in  the  sum  S(x)n  =  J2'i= l  §(Z)/  converges  in  distribution  to  v  €=  II  i  and 


1 

y/n(  2  —  A) 


Sn-v„  Hyn 

\/2n  —  1  \J2  —  1/n 


k 

where  74  =  E  m- 

i= 1 


The  first  term  on  the  right-hand  side  weakly  converges  in  distribution  to  f  €=  4>o,  1/2, 
while  the  second  term  converges  to  Hv/  \[2.  Clearly,  the  sum  of  these  independent 
summands  is,  generally  speaking,  not  distributed  normally  with  parameters  (0,  1). 
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To  see  that  condition  (b)  is  also  essential,  consider  an  arithmetic  random  variable 
§  with  E  §  =  0  and  Var(§ )  =  1 ,  take  r/  to  be  a  random  variable  with  the  uniform 
distribution  U_pi,  and  put 

§  with  probability  1  —  X , 

r\  with  probability  X . 

Here  the  random  variable  is  non-lattice  (its  distribution  has  an  absolutely  con¬ 
tinuous  component),  but 

(PQ,)(2n)  =  (1  -  X)  +  X<prj(2n),  q(y)  >l-2X. 

Again  putting  X  =  1/n,  we  get  the  triangular  array  scheme  for  which  condition  (b) 
is  not  met.  Relation  (8.7.31)  does  not  hold  either,  since,  in  the  previous  notation,  the 
sum  S(x)n  is  integer- valued  with  probability  F(vn  =  0)  =  e~l ,  so  that  its  distribution 
will  have  atoms  at  integer  points  with  probabilities  comparable,  by  Theorem  8.7.3, 
with  the  right-hand  side  of  (8.7.31).  This  clearly  contradicts  (8.7.31). 

If  we  put  X  =  1  /n2  then  the  sum  S(\)n  will  t>e  integer- valued  with  probability 
(1  —  1  /n2yi  1,  and  the  failure  of  relation  (8.7.31)  becomes  even  more  evident. 

Uniform  versions  of  the  local  Theorems  8.7.2  and  8.7.3  are  established  in  a  com¬ 
pletely  analogous  way. 

Theorem  8.7.2A  Let  the  distributions  F<x)  satisfy  the  conditions  of  Theorem  8.7.1  A 
with  62  =  00  and  the  conditions  of  Theorem  8.7.2,  in  which  conditions  (a)-(c)  are 
understood  in  the  uniform  sense  (i.e.,  maxx  fs(k)m  M  or  the  norm  of  fs(k)m  in  L2  or 
f  \(p™}(t)\dt  are  bounded  uniformly  in  X  e  [0,  X\]). 

Then  representation  (8.7.16)  holds  for  fs(k)n  M  uniformly  in  x  and  X ,  provided 
that  on  its  right-hand  side  we  replace  a  by  a{X). 

Proof  The  conditions  of  Theorem  8. 7. 2 A  are  such  that  they  enable  one  to  obtain 
the  proof  of  the  uniform  version  without  any  noticeable  changes  in  the  arguments 
proving  Theorems  8. 7.1  A  and  8.7.2.  □ 

The  following  assertion  is  established  in  the  same  way. 

Theorem  8.7.3  A  Let  the  arithmetic  distributions  F(^)  satisfy  the  conditions  of  The¬ 
orem  8.7.1  A  for  62  =  7t.  Then  representation  (8.7.17)  holds  uniformly  in  x  and  X , 
provided  that  a  and  a  on  its  right-hand  side  are  replaced  with  a(X)  and  a  (A),  re¬ 
spectively. 

Remark  8.7.3  applies  to  Theorems  8. 7. 2 A  and  8. 7. 3 A  as  well. 


8.8  Convergence  to  Other  Limiting  Laws 

As  we  saw  in  previous  sections,  the  normal  law  occupies  a  special  place  among  all 
distributions — it  is  the  limiting  law  for  normed  sums  of  arbitrary  distributed  random 
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variables.  There  arises  the  natural  question  of  whether  there  exist  any  other  limiting 
laws  for  sums  of  independent  random  variables. 

It  is  clear  from  the  proof  of  Theorem  8.2.1  for  identically  distributed  random 
variables  that  the  character  of  the  limiting  law  is  determined  by  the  behaviour  of  the 
ch.f.  of  the  summands  in  the  vicinity  of  0.  If  E§  =0  and  E§2  =  cr 2  =  —cp"( 0)  exist, 
then 


<P 


and  this  determines  the  asymptotic  behaviour  of  the  ch.f.  of  Sn/*fn,  equal  to 
(pn(t<s/n),  which  leads  to  the  normal  limiting  law.  Therefore,  if  one  is  looking  for 
different  limiting  laws  for  the  sums  Sn  =  % i  +  •  •  •  +  §w,  it  is  necessary  to  renounce 
the  condition  that  the  variance  is  finite  or,  which  is  the  same,  that  cp"( 0)  exists.  In 
this  case,  however,  we  will  have  to  impose  some  conditions  on  the  regular  variation 
of  the  functions  F+(x)  =  P(§  >  x)  and/or  F-(x)  =  P(§  <  —x)  as  v  — >  oo,  which 
we  will  call  the  right  and  the  left  tail  of  the  distribution  of  §,  respectively.  We  will 
need  the  following  concepts. 


Definition  8.8.1  A  positive  (Lebesgue)  measurable  function  L(t)  is  called  a  slowly 
varying  function  (s.v.f.)  as  t  —>  oo,  if,  for  any  fixed  v  >  0, 


L(vt) 

- >  1  as  t  — >  oo. 

Lit) 


(8.8.1) 


A  function  V  (t)  is  called  a  regularly  varying  function  (r.v.f.)  (of  index  —  /3)  as  t 
oo  if  it  can  be  represented  as 


V{t)=t~fiL(t), 


(8.8.2) 


where  L(t)  is  an  s.v.f.  as  t  ->  oo. 


One  can  easily  see  that,  similarly  to  (8.8.1),  the  characteristic  property  of  regu¬ 
larly  varying  functions  is  the  convergence 


V(vt) 


as  t 


oo 


(8.8.3) 


for  any  fixed  v  >  0.  Thus  an  s.v.f.  is  an  r.v.f.  of  index  zero. 

Among  typical  representatives  the  class  of  s.v.f. s  are  the  logarithmic  function  and 
its  powers  \ny  t ,  y  e  R,  linear  combinations  thereof,  multiple  logarithms,  functions 
with  the  property  that  L(t)  — >  L  =  const  ^  0  as  t  — >  oo  etc.  As  an  example  of  a 
bounded  oscillating  s.v.f.  we  mention 


L0(O  =  2  +  sin(lnlnf),  t  >  1. 


The  main  properties  of  r.v.f. s  are  given  in  Appendix  6. 
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As  has  already  been  noted,  for  Sn/b(n )  to  converge  to  a  “nondegenerate”  limiting 
law  under  a  suitable  normalisation  b(n ),  we  will  have  to  impose  conditions  on  the 
regular  variation  of  the  distribution  tails  of  § .  More  precisely,  we  will  need  a  regular 
variation  of  the  “two-sided  tail” 

F0(t)  =  F-(t)  +  F+(t)  =  P($  i  [-t,  t)). 

We  will  assume  that  the  following  condition  is  satisfied  for  some  ft  e  (0,2], 

p  e  [—1, 1]: 

[R  p,P]  The  two-sided  tail  Fq(x)  =  F-(x)  +  F+(x)  is  an  r.v.f.  as  v  ->  oo,  i.e.  it 
can  be  represented  as 


F0(x)  =  rftLFl)(x).  pe  (0,2], 
where  L fq(x)  is  an  s.v.f.,  and  the  following  limit  exists 


(8.8.4) 


F+(x) 

P+  :=  lim  6  [0,  1], 

JC^CX)  Fq(x) 


p  ;=  2 p+  —  1 


(8.8.5) 


If  p+  >  0,  then  clearly  the  right  tail  F+(jc)  is  an  r.v.f.  like  Fq(x),  i.e.  it  can  be 
represented  as 


F+(x)  =  V (x)  :=  v  ^L(x),  p  e  (0,  2],  L(x)  ~  p+Lfq(x). 


(Here,  and  likewise  in  Appendix  6,  we  use  the  symbol  V  to  denote  an  r.v.f.)  If 
p+  =  0,  then  the  right  tail  F+(x)  =  o(Fq(x))  is  not  assumed  to  be  regularly  varying. 
Relation  (8.8.5)  implies  that  the  following  limit  also  exists 

P-  ■=  lim  ——  =  l-p+. 

x^oo  /^o(v) 

If  p _  >  0,  then,  similarly  to  the  case  of  the  right  tail,  the  left  tail  F-(x)  can  be 
represented  as 

F-(x)  =  W(x)  :=  x~PLw(x),  fie  (0,2],  Lw(x)  ~  p-Lf0(x). 


If  p-  =  0,  then  the  left  tail  TL  (x)  =  o(Fo(x))  is  not  assumed  to  be  regularly  varying. 

The  parameters  p±  are  related  to  the  parameter  p  in  the  notation  [R^j/0]  through 
the  equalities 

p  =  p+-  p-=  2p+  -  1  g  [-1,  1]. 

Clearly,  in  the  case  /3  <  2  we  have  E§2  =  oo,  so  that  the  representation 


(Pit)  =  1  - 


At2 


+*('2) 


as  t 


0 


no  longer  holds,  and  the  central  limit  theorem  is  not  applicable.  If  E§  exists  and  is 
finite  then  everywhere  in  what  follows  it  will  be  assumed  without  loss  of  generality 
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that 


=0. 


Since  Fo(x)  is  non-increasing,  there  always  exists  the  “generalised”  inverse  function 
Fq~  \u)  understood  as 


Fq  l\u)  :=  inf{v  :  Fb(v)  <  w} 


■(-1) 


If  the  function  Fq  is  strictly  monotone  and  continuous  then  b  =  Fq  (u)  is  the 
unique  solution  to  the  equation 


Fo(b)  =  u,  u  e  (0,  1). 


Set 


Kn  •  — 


m 


b(n)  ’ 


wherein  the  case  p  >  2  we  define  the  normalising  factor  b(n)  by 


-(-i) 


For  P  =  2  put 


■  _  v(— 1) 


(1  In). 

(8.8.6) 

(1  In), 

(8.8.7) 

where 


Y(x):=  2x  2  f  yF0(y)dy  =  2x 

Jo 


-2 


px  px 

/  yF+{y)dy  +  /  yF_(y)dy 
Jo  Jo 


=  x  2E(fi2;  —x  <  ^  <  x)  =  x  2Ly(x ), 


LJO 
-2 


(8.8.8) 


Ly  is  an  s.v.f.  (see  Theorem  A6.2.1(iv)  in  Appendix  6).  It  follows  from  Theo¬ 
rem  A6.2.1(v)  in  Appendix  6  that,  under  condition  (8.8.4),  we  have 

b{n)  =  n1^ Lb(n),  p  <  2, 


where  is  an  s.v.f. 

We  introduce  the  functions 


-L 


x 


Vi(x)=  /  V  (y)  dy, 


V 


roc 

‘(x)=  I 

J  X 


V  (y)  dy. 


8.8.1  The  Integral  Theorem 

Theorem  8.8.1  Let  condition  [R  p,p\  be  satisfied.  Then  the  following  assertions  hold 
true. 
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(i)  For  p  e  (0,  2),  p  ^  1  and  the  normalising  factor  (8.8 .6),  as  n  ^  oo, 


Kn  =>  K 


(P,p) 


(8.8.9) 


The  distribution  Fp,pof  the  random  variable  depends  on  parameters  P 

and  p  only  and  has  a  chf  cp^,p\t),  given  by 


cp(P,p\t)  :=  Felt^'P)  =  exp{\t\P B(P,  p,  #)}, 
where  d  =  sign/1, 


B(P,p,d)  =  r(  1 


ipd  sin 


pn 

~Y 


(8.8.10) 


(8.8.11) 


and,  for  p  e  (1,  2),  we  put  F(  1  —  p)  =  F( 2  —  p)/(  1  —  /3). 

(ii)  When  p  =  l,  for  the  sequence  t;n  with  the  normalising  factor  (8.8.6)  to  con¬ 
verge  to  a  limiting  law,  the  former,  generally  speaking,  needs  to  be  centred. 
More  precisely,  as  n  —>  oo,  the  following  convergence  takes  place : 


An  K 


(1  ,P) 


where 

An  =  ~  wpb(n)^  ~pC’ 
C  ze  0.5112  is  the  Euler  constant,  and 


<pQ'p\t)  =  Felti>(  P)  =  exp 


7 lit 


—  ipt  In  \t\ 


(8.8.12) 


(8.8.13) 


(8.8.14) 


If  n[Vi(b(n))  —  Wj(b(n))]  =  o(b(n)),  then  p  =  0  and  we  can  put  An  =  0. 
IfF^  exists  and  equals  zero  then 

An  =  Pf)[W,{b{n))-V'(b(n))}-PC. 

im=o  and  p  ^  0  then  pAn  —>  — oo  as  n  —>  oo. 

(iii)  For  p  =  2  and  the  normalising  factor  (8.8.7),  as  n  ^  oo, 

=►  y2'p\  <p(2'p\t)  :=  Ee^(2-P)  =  e~‘2'2, 

so  that  ^1,p^  has  the  standard  normal  distribution  that  is  independent  of  p. 


The  Proof  of  Theorem  8.8.1  is  based  on  the  same  considerations  as  the  proof  of 
Theorem  8.2.1,  i.e.  on  using  the  asymptotic  behaviour  of  the  ch.f.  <p(t)  in  the  vicinity 
of  zero.  But  here  it  will  be  somewhat  more  difficult  from  the  technical  viewpoint. 
This  is  why  the  proof  of  Theorem  8.8.1  appears  in  Appendix  7.  □ 
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Remark  8.8.1  The  last  assertion  of  the  theorem  (for  /3  =  2)  shows  that  the  limiting 
distribution  may  be  normal  even  in  the  case  of  infinite  variance  of  § . 

Besides  with  the  normal  distribution,  we  also  note  “extreme”  limit  distributions, 
corresponding  to  the  p  =  ±1  where  the  ch.f.  < p^,p )  (or  the  respective  Laplace  trans¬ 
form)  takes  a  very  simple  form.  Let,  for  example,  p  =  —  1.  Since  el7Tl} /2  =  &i,  then, 
for  p  /  1,2, 


find-  pTtil 

i  sin - h  cos - 


=  -r( l  -  py^2  =  -r( l  -  pxi&f, 

=exp{-r(l  -P)\P},  Re  A  >  0. 


Similarly,  for  ft  =  1,  by  (8.8.14)  and  the  equalities  —  V  =  i  ‘-p-  =  i  In  i  0  we  have 


In  <p^'  *^(/)  = - \-  it  In  |t|  =  it  In/??  +  it  In  \  t  \  =itlnit, 


Tlftt 


=  expjAlnA.},  ReA>0. 


A  similar  formula  is  valid  for  p  =  1 


Remark  8.8.2  If  <  2,  then  by  virtue  of  the  properties  of  s.v.f.s  (see  Theo¬ 
rem  A6.2.1(iv)  in  Appendix  6),  as  v  — >  oo, 


rx  rx  j  1 

/  yFo(y)dy=  y1~pLFo(y)dy  ~  - — -x2~^LFo(x)  =  - — -x2F0(x). 

Jo  Jo  4  —  p  1  -  p 


Therefore,  for  /3  <  2,  we  have  Y  (x)  ~  2(2  —  /3)  1  Fq(x) 


T(_1)(l/n)  -  F. 


(_i)/2-/3 


1/(0 


o 


2n 


.2-/5 


(cf.  (8.8.6)).  On  the  other  hand,  for  /3  =  2  and  <r2  :=  E§2  <  oo  one  has 


Y (x)  ^  x  2cr2,  b(n)  =  Y ^  l\\/n)  ~  ^/crn. 


Thus  normalisation  (8.8.7)  is  “transitional”  from  normalisation  (8.8.6)  (up  to  the 
constant  factor  (2/(2  —  /3))ly^)  to  the  standard  normalisation  cr^/n  in  the  cen¬ 
tral  limit  theorem  in  the  case  where  E§2  <  oo.  This  also  means  that  normalisa¬ 
tion  (8.8.7)  is  “universal”  and  can  be  used  for  all  /3  <  2  (as  it  is  done  in  many 
textbooks  on  probability  theory).  However,  as  we  will  see  below,  in  the  case  /3  <  2 
normalisation  (8.8.6)  is  easier  and  simpler  to  deal  with,  and  therefore  we  will  use 
that  scaling. 
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Recall  that  ¥^p  denotes  the  distribution  of  the  random  variable  The  pa¬ 

rameter  ft  takes  values  in  the  interval  (0,  2],  the  parameter  p  =  p+  —  p_  can  assume 
any  value  from  [—1,  1].  The  role  of  the  parameters  ft  and  p  will  be  clarified  below. 

Theorem  8.8.1  implies  that  each  of  the  laws  ¥^p,  0  <  ft  <  2  and  —  1  <  p  <  1  is 
limiting  for  the  distributions  of  suitably  normalised  sums  of  independent  identically 
distributed  random  variables.  It  follows  from  the  law  of  large  numbers  that  the  de¬ 
generate  distribution  la  concentrated  at  the  point  a  is  also  a  limiting  one.  Denote  the 
set  of  all  such  distributions  by  ©o.  Furthermore,  it  is  not  hard  to  see  that  if  F  is  a  dis¬ 
tribution  from  the  class  ©o  then  the  law  that  differs  from  F  by  scaling  and  shifting, 
i.e.  the  distribution  F{flj&}  defined,  for  some  fixed  b  >  0  and  a ,  by  the  relation 


F [a,b}(B)  -=F 


B  —  a\ 

~b~  y 


where 


B  —  a 
b 


—  \u  G  M  :  ub  T  a  G  B }, 


is  also  limiting  for  the  distributions  of  sums  of  random  variables  (Sn  —  an)/bn  as 
n  — >►  oo  for  appropriate  {an}  and  {bn}. 

It  turns  out  that  the  class  of  distributions  ©  obtained  by  the  above  extension  from 
©0  exhausts  all  the  limiting  laws  for  sums  of  identically  distributed  independent 
random  variables. 

Another  characterisation  of  the  class  of  limiting  laws  ©  is  also  possible. 


Definition  8.8.2  We  call  a  distribution  F  stable  if,  for  any  a\,  <22,  b\  >  0,  Z?2  >  0, 
there  exist  a  and  b  >  0  such  that 


F{«iA)  *F{a2,fc2)  =F(a>}. 

This  definition  means  that  the  convolution  of  a  stable  distribution  F  with  itself 
again  yields  the  same  distribution  F,  up  to  a  scaling  and  shift  (or,  which  is  the 
same,  for  independent  random  variables  §/  §F  we  have  (§1  +  §2  —  a)/b  ^  F  for 
appropriate  a  and  b). 

In  terms  of  the  ch.f.  (p,  the  stability  property  has  the  following  form:  for  any 
b\  >  0  and  £>2  >  0,  there  exist  a  and  b  >  0  such  that 

(p{tb\)(p{tb2)  =  elta(p{tb),  t  G  R.  (8.8.15) 

Denote  the  class  of  all  stable  laws  by  ©5.  The  remarkable  fact  is  that  the  class  of  all 
limiting  laws  ©  (for  (Sn  —  an)/bn  for  some  an  and  bn)  and  the  class  of  all  stable 
laws  ©  s  coincide. 

If,  under  a  suitable  normalisation,  as  n  — >►  00, 

then  one  says  that  the  distribution  F  of  the  summands  §  belongs  to  the  domain  of 
attraction  of  the  stable  law  ¥^p. 

Theorem  8.8.1  means  that,  if  F  satisfies  condition  [R^p],  then  F  belongs  to  the 
domain  of  attraction  of  the  stable  law  ¥^p. 
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One  can  prove  the  converse  assertion  (see  e.g.  Chap.  XVII,  §  5  in  [30]):  if  F 
belongs  to  the  domain  of  attraction  of  a  stable  law  F^p  for  /3  <  2,  then  [R^p]  is 
satisfied. 

As  for  the  role  of  the  parameters  /3  and  p,  note  the  following.  The  parameter  /3 
characterises  the  rate  of  convergence  to  zero  as  x  ->  oo  for  the  functions 

Fp,p,—(x)  •=F/8,p((  oo,  x))  and  Fftp,+(x)  :=  Fj8,p([x,  oo)). 

One  can  prove  that,  for  p+  >  0,  as  t  — >  oo, 

Fp,f>,+{t)  ~  (8.8.16) 


and,  for  p_  >  0,  as  f  — >►  oo, 


Fp,p,-(t)  ~  P-t 


(8.8.17) 


Note  that,  for  §  ^  F^j/0,  the  asymptotic  relations  in  Theorem  8.8.1  turn  into  pre¬ 
cise  equalities  provided  that  we  replace  in  them  b(n)  with  bn  :=n1^  .In  particular, 


P 


=  Ft 


(0- 


(8.8.18) 


This  follows  from  the  fact  that  [(p^,p\t /bn)]n  coincides  with  cp(P’p\t)  (see  (8.8.10)) 
and  hence  the  distribution  of  the  normalised  sum  Sn/bn  coincides  with  the  distribu¬ 
tion  of  the  random  variable  § . 

The  parameter  p  taking  values  in  [  —  1,  1]  is  the  measure  of  asymmetry  of  the  dis¬ 
tribution  F^?p.  If,  for  instance,  p  =  1  (p_  =  0),  then,  for  /3  <  1,  the  distribution  \ 
is  concentrated  entirely  on  the  positive  half-line.  This  is  evident  from  the  fact  that  in 
this  case  F^i  can  be  considered  as  the  limiting  distribution  for  the  normalised  sums 
of  independent  identically  distributed  random  variables  ^  >  0  (with  F_( 0)  =  0). 
Since  all  the  prelimit  distributions  are  concentrated  on  the  positive  half-line,  so  is 
the  limiting  distribution. 

Similarly,  for  p  =  —  1  and  /3  <  1,  the  distribution  F^_i  is  entirely  concentrated 
on  the  negative  half-line.  For  p  =  0  (p+  =  p_  =  1  /2)  the  ch.f.  of  the  distribution 
F^o  will  be  real,  and  the  distribution  F^o  itself  is  symmetric. 

As  we  saw  above,  the  ch.f.s  <p(P,f)\t)  of  stable  laws  F^p  admit  closed-form  rep¬ 
resentations.  They  are  clearly  integrable  over  R,  and  the  same  is  true  for  the  func¬ 
tions  tk(p(P’P\t)  for  any  k  >  1.  Therefore  all  the  stable  distributions  have  densities 
that  are  differentiable  arbitrarily  many  times  (see  e.g.  the  inversion  formula  (7.2.1)). 
As  for  explicit  forms  of  these  densities,  they  are  only  known  for  a  few  laws.  Among 
them  are: 
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1.  The  normal  law  J?2,p  (which  does  not  depend  on  p). 

2.  The  Cauchy  distribution  Fpo  with  density  2/(7r2  +  4v2),  — oo  <  v  <  oo.  Scal¬ 
ing  the  v-axis  with  a  factor  of  7r/2  transforms  this  density  into  the  form  \/n (1  +  x1) 
corresponding  to  Ko,  i . 

3.  The  Levy  distribution.  This  law  can  be  obtained  from  the  explicit  form  for 
the  distribution  of  the  maximum  of  the  Wiener  process.  This  will  be  the  distribution 
Fi/2,i  with  parameters  1/2,  1  and  density  (up  to  scaling;  cf.  (8.8.16)) 


/ 


^x\x)  = 


1 


V2^/2 


-1/(2*) 


x  >  0 


(this  density  has  a  first  hitting  time  of  level  1  by  the  standard  Wiener  process,  see 
Theorem  19.2.2). 


8.8.2  The  Integro-Local  and  Local  Theorems 


Under  the  conditions  of  this  section  we  can  also  obtain  integro-local  and  local  the¬ 
orems  in  the  same  way  as  in  Sect.  8.7  in  the  case  of  convergence  to  the  normal  law. 
As  in  Sect.  8.7,  integro-local  theorems  deal  here  with  the  asymptotics  of 

P (Sn  €  A[x)),  A[x)  =  [x,x  +  A) 

as  n  — >  oo  for  a  fixed  A  >  0. 

As  we  can  see  from  Theorem  8.8.1,  the  ch.f.  <p(P,p\t)  of  the  stable  law  F^p  is 
integrable,  and  hence,  by  the  inversion  formula,  there  exists  a  uniformly  continuous 
density  f^^p) 0f  the  distribution  F^p.  (As  has  already  been  noted,  it  is  not  difficult 
to  show  that  f^hm)  is  differentiable  arbitrarily  many  times,  see  Sect.  7.2.) 


Theorem  8.8.2  (The  Stone  integro-local  theorem)  Let  §  be  a  non-lattice  random 
variable  and  the  conditions  of  Theorem  8.8.1  be  met.  Then,  for  any  fixed  A  >  0,  as 
n  —>  oo , 


P  (SneA[x)) 


(8.8.19) 


where  the  remainder  term  o(j^)  is  uniform  over  x. 

If  /3  =  1  and  E|§  |  does  not  exist  then ,  on  the  right-hand  side  of  ( 8.8.20),  we  must 
replace  y)  with  f^,p\~^p)  ~  ^ n where  An  is  defined  in  (8.8.13). 


All  the  remarks  to  the  integro-local  Theorem  8.7.1  hold  true  here  as  well,  with 
evident  changes. 

If  the  distribution  of  Sn  has  a  density  then  we  can  find  the  asymptotics  of  that 
density. 


236 


8  Sequences  of  Independent  Random  Variables.  Limit  Theorems 


Theorem  8.8.3  Let  there  exist  an  m  >  1  such  that  at  least  one  of  conditions  ( a)-(c ) 
of  Theorem  8.7.2  is  satisfied.  Moreover ;  let  the  conditions  of  Theorem  8.8.1  be  met. 
Then  for  the  density  fsn  (x)  of  the  distribution  of  Sn  one  has  the  representation 


fs,M)  = 


(8.8.20) 


which  holds  uniformly  in  x  as  n  ^  o o. 

Iff  =  1  and  E|§  |  does  not  exist  then ,  on  the  right-hand  side  of  { 8.8.20),  we  must 
replace  f^,p\-^ y)  with  f^,p\p^[)  —  An),  where  An  is  defined  in  (8.8.13). 


c 

The  assertion  of  Theorem  8.8.3  can  be  rewritten  for  t;n  =  —  An  as 


h„(v)  ^  f(P’p) (V) 


for  any  v  as  n  ->  oo. 

For  integer- valued  ^  the  following  theorem  holds  true. 


Theorem  8.8.4  Let  the  distribution  of  §  be  arithmetic  and  the  conditions  of  Theo¬ 
rem  8.8.1  be  met.  Then ,  uniformly  for  all  integers  x,  as  n  ->  oo, 


P(s„  =  x)  =  A-  fv,P)  ( ,  +ol 


l 


b(n) 


b(n) 


(8.8.21) 


where  a  =  E§  if  E  |§|  exists  and  a  =  0  if  E  |§|  not  f  1.  If  f  =  1 
tmd  E|§|  does  not  exist  then ,  cw  right-hand  side  of  (8.8.21),  we  rat/sf  replace 

/(Ap)(  with  f{fi'p)(w)  -  An)- 


The  proofs  of  Theorems  8. 8. 2-8. 8.4  mostly  repeat  those  of  Theorems  8.7. 1-8. 7. 3 
and  can  be  found  in  Appendix  7. 


8.8.3  An  Example 

In  conclusion  we  will  consider  an  example. 

In  Sect.  12.8  we  will  see  that  in  the  fair  game  considered  in  Example  4.2.3  the 
ruin  time  r](z)  of  a  gambler  with  an  initial  capital  of  z  units  satisfies  the  relation 
P (rj(z)  >n)^  Z\fTJivn  as  n  — >  oo.  In  particular,  for  z  =  1, 

Pf?(l)  >  n)  ~  y/2/nn.  (8.8.22) 

It  is  not  hard  to  see  (for  more  detail,  see  also  Chap.  12)  that  r\  (z)  has  the  same 
distribution  as  rji  +  772  H - h  v)z,  where  rij  are  independent  and  distributed  as  77(1). 
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Thus  for  studying  the  distribution  of  77  (z)  when  z  is  large,  by  virtue  of  (8.8.22),  one 
can  make  use  of  Theorem  8.8.4  (with  ^  =  1/2,  b(n)  =  In1  In),  by  which 


lim  P 

z^oo 


2nr]{x) 


<  v 


^1/2,  l(x) 


(8.8.23) 


is  the  Levy  stable  law  with  parameters  =  1/2  and  p  =  1 .  Moreover,  for  integer  v 
and  z  — ^  00, 


p  (r)(z)=x) 


n 

2  z2 


f(  1/2,1) 


+  <9 


These  assertions  enable  one  to  obtain  the  limiting  distribution  for  the  number  of 
crossings  of  an  arbitrary  strip  [u,  v]  by  the  trajectory  S\ , . . . ,  Sn  in  the  case  where 


P(&  =  -l)  =  P(&  =  -l)  =  l/2. 


Indeed,  let  for  simplicity  u  =  0.  By  the  first  positive  crossing  of  the  strip  [0,  v]  we 
will  mean  the  Markov  time 


77+  :=  min{k  :  Sk  =  r>}. 

The  first  negative  crossing  of  the  strip  is  then  defined  as  the  time  77+  +  77- ,  where 

77-  :=min{k  :  Sn++k  =  0}. 

The  time  771  =  77+  +  ij-  will  also  be  the  time  of  the  “double  crossing”  of  [0,  v].  The 
variables  77  ±  are  distributed  as  rjiv)  and  are  independent,  so  that  771  has  the  same 

distribution  as  r](2v).  The  variable  Hk  =  771  (2v)  H - f-  r]k(2v),  where  rn(2v)  have 

the  same  distribution  as  r](2v)  and  are  independent,  is  the  time  of  the  k-th  double 
crossing.  Therefore 

v(n)  :=  max{k  :  Hk  <  n}  =  min{k  :  Hk  >  n}  —  1 


is  the  number  of  double  crossings  of  the  strip  [0,  v ]  by  time  n.  Now  we  can  prove 
the  following  assertion: 


lim  pl  -x) =  Fi/2,i(y“TT 

n^o o  \  fn  J  \  2vAxA 


(8.8.24) 


To  prove  it,  we  will  make  use  of  the  following  relation  (which  will  play,  in  its 
more  general  form,  an  important  role  in  Chap.  10): 

{ v(n)  >k }  =  {Hk  <n}, 

where  Hk  is  distributed  as  r](2vk).  If  n/k 2  — >  s2  as  n  — >  00,  then  by  virtue  of 
(8.8.23) 

,2 


(2nHk  2nn 

- n  < - ^ 

(2 vk)2  ~  (2 vky 


Fl/2,1 


JT  S' 


2v2  r 
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and  therefore 


(Here  for  k  =  \x<sfn\  one  has  n/k 2  — >  s2  =  1  /x2.)  Relation  (8.8.24)  is  proved.  □ 

Assertion  (8.8.24)  will  clearly  remain  true  for  the  number  of  crossings  of  the 
strip  [m,u],  m/0;  one  just  has  to  replace  v  with  v  —  u  on  the  right-hand  side  of 
(8.8.24).  It  is  also  clear  that  (8.8.24)  enables  one  to  find  the  limiting  distribution  of 
the  number  of  “simple”  (not  double)  crossings  of  [ u ,  v]  since  the  latter  is  equal  to 
2 v(n)  or  2v(n)  +  l. 


Chapter  9 

Large  Deviation  Probabilities  for  Sums 
of  Independent  Random  Variables 


Abstract  The  material  presented  in  this  chapter  is  unique  to  the  present  text.  After 
an  introductory  discussion  of  the  concept  and  importance  of  large  deviation  prob¬ 
abilities,  Cramer’s  condition  is  introduced  and  the  main  properties  of  the  Cramer 
and  Laplace  transforms  are  discussed  in  Sect.  9.1.  A  separate  subsection  is  devoted 
to  an  in-depth  analysis  of  the  key  properties  of  the  large  deviation  rate  function, 
followed  by  Sect.  9.2  establishing  the  fundamental  relationship  between  large  devi¬ 
ation  probabilities  for  sums  of  random  variables  and  those  for  sums  of  their  Cramer 
transforms,  and  discussing  the  probabilistic  meaning  of  the  rate  function.  Then  the 
logarithmic  Large  Deviations  Principle  is  established.  Section  9.3  presents  integro- 
local,  integral  and  local  theorems  on  the  exact  asymptotic  behaviour  of  the  large 
deviation  probabilities  in  the  so-called  Cramer  range  of  deviations.  Section  9.4  is  de¬ 
voted  to  analysing  various  types  of  the  asymptotic  behaviours  of  the  large  deviation 
probabilities  for  deviations  at  the  boundary  of  the  Cramer  range  that  emerge  under 
different  assumptions  on  the  distributions  of  the  random  summands.  In  Sect.  9.5, 
the  behaviour  of  the  large  deviation  probabilities  is  found  in  the  case  of  heavy-tailed 
distributions,  namely,  when  the  distributions  tails  are  regularly  varying  at  infinity. 
These  results  are  used  in  Sect.  9.6  to  find  the  asymptotics  of  the  large  deviation 
probabilities  beyond  the  Cramer  range  of  deviations,  under  special  assumptions  on 
the  distribution  tails  of  the  summands. 


Let  §,  §i,  §2>  •  •  •  be  a  sequence  of  independent  identically  distributed  random  vari¬ 
ables, 

n 

E &  =  0,  E§/  =  er2  <  oo,  Sn  =  ^ 

k=  1 

Suppose  that  we  have  to  evaluate  the  probability  P(Sn>x).lfx~  v^fn  as  n  — >  oo, 
v  =  const,  then  by  the  integral  limit  theorem 

P(5„>x)~1-$(4  (9.0.1) 

as  n  —>  oo.  But  if  v  ^fn,  then  the  integral  limit  theorem  enables  one  only  to 
conclude  that  P (Sn  >  i)  ^  0  as  n  ^  oo,  which  in  fact  contains  no  quantitative 
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information  on  the  probability  we  are  after.  Essentially  the  same  can  happen  for 
fixed  but  “relatively”  large  values  of  v/cr.  For  example,  for  v/cr  >  3  and  the  values 
of  n  around  100,  the  relative  accuracy  of  the  approximation  in  (9.0.1)  becomes,  gen¬ 
erally  speaking,  bad  (the  true  value  of  the  left-hand  side  can  be  several  times  greater 
or  smaller  than  that  of  the  right-hand  side).  Studying  the  asymptotic  behaviour  of 
P(Sn  >  x)  for  v  ^  +Jn  as  n  ->  oo,  which  is  not  known  to  us  yet,  could  fill  these 
gaps.  This  problem  is  highly  relevant  since  questions  of  just  this  kind  arise  in  many 
problems  of  mathematical  statistics,  insurance  theory,  the  theory  of  queueing  sys¬ 
tems,  etc.  For  instance,  in  mathematical  statistics,  finding  small  probabilities  of  er¬ 
rors  of  the  first  and  second  kind  of  statistical  tests  when  the  sample  size  n  is  large 
leads  to  such  problems  (e.g.  see  [7]).  In  these  problems,  we  have  to  find  explicit 
functions  P(n,x )  such  that 

P(Sn>x)  =  P(n,x)(l+o(l ))  (9.0.2) 

as  n  —>  oo.  Thus,  unlike  the  case  of  normal  approximation  (9.0.1),  here  we  are 
looking  for  approximations  P(n,x )  with  a  relatively  small  error  rather  than  an  ab¬ 
solutely  small  error.  If  P(n ,  x)  — >  0  in  (9.0.2)  as  n  — >►  oo,  then  we  will  speak  of  the 
probabilities  of  rare  events ,  or  of  the  probabilities  of  large  deviations  of  sums  Sn . 
Deviations  of  the  order  % fn  are  called  normal  deviations. 

In  order  to  study  large  deviation  probabilities,  we  will  need  some  notions  and 
assertions. 


9.1  Laplace’s  and  Cramer’s  Transforms.  The  Rate  Function 

9.1.1  The  Cramer  Condition.  Laplace’s  and  Cramer’s  Transforms 


In  all  the  sections  of  this  chapter,  except  for  Sect.  9.5,  the  following  Cramer  condi¬ 
tion  will  play  an  important  role. 

[C]  There  exists  a  X  0  such  that 


eXyF (dy)  <  oo. 


(9.1.1) 


We  will  say  that  the  right-side  ( left-side )  Cramer  condition  holds  if  X  >  0  (A  <  0) 
in  (9.1.1).  If  (9.1.1)  is  valid  for  some  negative  and  positive  A  (i.e.  in  a  neighbour¬ 
hood  of  the  point  A  =  0),  then  we  will  say  that  the  two-sided  Cramer’s  condition  is 
satisfied. 

The  Cramer  condition  can  be  interpreted  as  characterising  a  fast  (at  least  expo¬ 
nentially  fast)  rate  of  decay  of  the  tails  F±(t)  of  the  distribution  F.  If,  for  instance, 
we  have  (9.1.1)  for  A  >  0,  then  by  Chebyshev’s  inequality,  for  t  >  0, 

F+(t)  :=  P(£  >t)<  e~x,Eexf 
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i.e.  F+(f)  decreases  at  least  exponentially  fast.  Conversely,  if,  for  some  /x  >  0,  one 
has  F+(t)  <  ce~^ ,  t  >  0,  then,  for  A  g  (0,  /x), 


p  oo  p  OO  poo 

/  eXyF(dy)  =  -  eXy  dF+(y)  =  F+(0)  +  A  /  eXyF+(y)dy 

Jo  Jo  Jo 


poo 

<  F+( 0)  +  cA  /  e{x~ll)ydy  =  F+( 0)  + 

Jo 


cX 


li  —  X 


<  00. 


Since  the  integral  £lvF(<iy)  is  finite  for  any  A  >  0,  we  have  <  oo  for 
A  g  (0,  /x). 

The  situation  is  similar  for  the  left  tail  F-(t)  :=  P(§  <  —  0  provided  that  (9.1.1) 
holds  for  some  A  <  0. 

Set 


A+  :=  sup{A  :  Eex^  <  oo},  A_  :=  inf{A  :  Eex^  <  oo}. 

Condition  [C]  is  equivalent  to  A+  >  A_.  The  right-side  Cramer  condition  means 
that  A+  >  0;  the  two-sided  condition  means  that  A+  >  0  >  A_.  Clearly,  the  ch.f. 
cp(t)  =  Eelt ^  is  analytic  in  the  complex  plane  in  the  strip  — A+  <  Imt  <  —  A_.  This 
follows  from  the  differentiability  of  cp(t)  in  this  region  of  the  complex  plane,  since 
the  integral  f  \yelty\E(dy)  for  the  said  values  of  Imt  converges  uniformly  in  Ret. 

Here  and  henceforth  by  the  Laplace  transform  (Laplace-Stieltjes  or  Laplace- 
Lebesgue)  of  the  distribution  F  of  the  random  variable  §  we  shall  mean  the  function 


f(X)  :=  Eex ^  =  cp(-iX), 


which  conflicts  with  Sect.  7.1.1  (and  the  terminology  of  mathematical  analysis), 
according  to  which  the  term  Laplace’s  transform  refers  to  the  function  Ee~x^  = 
cp(i A).  The  reason  for  such  a  slight  inconsistency  in  terminology  (only  the  sign  of 
the  argument  differs,  this  changes  almost  nothing)  is  our  reluctance  to  introduce  new 
notation  or  to  complicate  the  old  notation.  Nowhere  below  will  it  cause  confusion. 

As  well  as  condition  [C],  we  will  also  assume  that  the  random  variable  §  is 
nondegenerate,  i.e.  §  =£  const  or,  which  is  the  same,  Var§  >  0. 

The  main  properties  of  Laplace  Js  transform. 

As  was  already  noted  in  Sect.  7.1.1,  Laplace’s  transform,  like  the  ch.f.,  uniquely 
characterises  the  distribution  F.  Moreover,  it  has  the  following  properties,  which 
are  similar  to  the  corresponding  properties  of  ch.f.s  (see  Sect.  7.1).  Under  obvious 
conventions  of  notation, 

(^  1)  J/a+b^  (A)  =  eXaf^  ( bX ),  if  a  and  b  are  constant. 


Tn  the  literature,  the  function  Eex %  is  sometimes  called  the  “moment  generating  function”. 
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(^2)  If  . . . ,  are  independent  and  Sn  =  Y^j= l  §/>  t^ien 

n 

=  n 

7  =  1 

(^3)  If  E\^\k  <  oo  and  the  right-side  Cramer  condition  is  satisfied  then  the  func¬ 
tion  \J/%  is  k-times  right  differentiable  at  the  point  X  =  0, 

^f}(0)  =  E  §*=:m* 

and,  as  X  f  0, 

k  A  i 

(A)  =  1  +  — m  j  +  °{^k)  • 

7  =  1  J ' 

This  also  implies  that,  as  A  f  0,  the  representation 

=  (9.1.2) 

7=i  J ' 

holds,  where  are  the  so-called  semi-invariants  (or  cumulants)  of  order  j  of  the 
random  variable  § .  One  can  easily  verify  that 

yi=m\,  y2=«2  =  °’2,  Yi =  m3’  ....  (9.1.3) 

where  =  E(£  —m\)k  is  the  central  moment  of  order  k. 

Definition  9.1.1  Let  condition  [C]  be  met.  The  Cramer  transform  at  the  point  X  of 
the  distribution  F  is  the  distribution 


F(A)(d;y)  = 


ekyF(dy) 

\//(X) 


(9.1.4) 


2In  some  publications  the  transform  (9.1.4)  is  also  called  the  Esscher  transform.  However,  the 
systematic  use  of  transform  (9.1.4)  for  the  study  of  large  deviations  was  first  done  by  Cramer. 

If  we  study  the  probabilities  of  large  deviations  of  sums  of  random  variables  using  the  inver¬ 
sion  formula,  similarly  to  what  was  done  for  normal  deviations  in  Chap.  8,  then  we  will  necessarily 
come  to  employ  the  so-called  saddle-point  method ,  which  consists  of  moving  the  contour  of  inte¬ 
gration  so  that  it  passes  through  the  so-called  saddle  point ,  at  which  the  exponent  in  the  integrand 
function,  as  we  move  along  the  imaginary  axis,  attains  its  minimum  (and,  along  the  real  axis,  at¬ 
tains  its  maximum;  this  explains  the  name  “saddle  point”).  Cramer’s  transform  does  essentially 
the  same,  making  such  a  translation  of  the  contour  of  integration  even  before  applying  the  inver¬ 
sion  formula,  and  reduces  the  large  deviation  problem  to  the  normal  deviation  problem,  where  the 
inversion  formula  is  not  needed  if  we  use  the  results  of  Chap.  8.  It  is  this  technique  that  we  will 
follow  in  the  present  chapter. 
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Clearly,  the  distributions  F  and  F(^)  are  mutually  absolutely  continuous  (see 
Sect.  3.5  of  Appendix  3)  with  density 

F W(dy)  _ 

F  (dy) 


Denote  a  random  variable  with  distribution  F(^)  by  $(x). 

The  Laplace  transform  of  the  distribution  F (^)  is  obviously  equal  to 


Ee^w  =  t&  +  V±' 

i}r(X) 


(9.1.5) 


Clearly, 


e§(A)  = 


Var(£(JL))  = 


(In  (^-))/, 


V&) 

’ 


Since  if/' "(A.)  >  0  and  Var(§(^))  >  0,  the  foregoing  implies  one  more  important  prop¬ 
erty  of  the  Laplace  transform. 

(^4)  The  functions  and  In  \jf{X)  are  strictly  convex,  and 

fix) 


strictly  increases  on  (A._,  A.+). 

The  analyticity  of  \/r(X)  in  the  strip  Re  A.  e  (A._,A.+)  can  be  supplemented  by 
the  following  “extended”  continuity  property  on  the  segment  [A_,  A.+]  (in  the  strip 
Re  A,  c  [A,  _ ,  A.  _)_  ] ) . 

(*^5)  The  function  \jf{X)  is  continuous  “ inside ”  [A_,  A.+],  i.e.  \Js(  A.±  =p0)  = 

(where  the  cases  f(h±)  =  oo  are  not  excluded). 

Outside  the  segment  [A._,A.+]  such  continuity,  generally  speaking,  does  not 
hold  as,  for  example,  is  the  case  when  ^r(A.+)  <  oo  and  ^(A.+  +  0)  =  oo,  which 
takes  place,  say,  for  the  distribution  F  with  density  f(x)  =  cx~3e~x+x  for  v  >  1, 
c  =  const. 


9.1.2  The  Large  Deviation  Rate  Function 

Under  condition  [C],  the  large  deviation  rate  function  will  play  the  determining  role 
in  the  description  of  asymptotics  of  probabilities  P (Sn  >  x). 
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Definition  9.1.2  The  large  deviation  rate  function  (or,  for  brevity,  simply  the  rate 
function)  A  of  a  random  variable  §  is  defined  by 

A(a)  :=  sup(aA  —  ln^(A)).  (9.1.6) 


The  meaning  of  the  name  will  become  clear  later.  In  classical  analysis,  the  right- 
hand  side  of  (9.1.6)  is  known  as  the  Legendre  transform  of  the  function  In  ^r(A). 

Consider  the  function  A(a,X)  =  aX  —  lm/f(A)  of  the  supremum  appearing 
in  (9.1.6).  The  function  —  In  ^r(A)  is  strictly  concave  (see  property  (^4)),  and  hence 
so  is  the  function  A(a,X)  (note  also  that  A(a,X)  =  —  In fa(l),  where  \/ra(X)  = 
e~Xa\[r(X)  is  the  Laplace  transform  of  the  distribution  of  the  random  variable  §  —  a 
and,  therefore,  from  the  “qualitative  point  of  view”,  A  (a,  X)  possesses  all  the  prop¬ 
erties  of  the  function  —  lnf(X)).  The  foregoing  implies  that  there  always  exists  a 
unique  point  X  =  X(a)  (on  the  “extended”  real  line  [— oo,  oo])  at  which  the  supre¬ 
mum  in  (9.1.6)  is  attained.  As  a  grows,  the  value  of  A(a,  X)  for  A  >  0  increases 
(proportionally  to  A),  and  for  A  <  0  it  decreases.  Therefore,  the  graph  of  A(a,  A)  as 
the  function  of  A  will,  roughly  speaking,  “roll  over”  to  the  right  as  a  grows.  This 
means  that  the  maximum  point  A  (a)  will  also  move  to  the  right  (or  stay  at  the  same 
place  if  X(a)  =  A+). 

We  now  turn  to  more  precise  formulations.  On  the  interval  [A_,  A+],  there  exists 
the  derivative  (respectively,  the  right  and  the  left  derivative  at  the  endpoints  A±) 


A[(a,  A)  =  a 


f(X)  ' 


(9.1.7) 


The  parameters 


A±  =p0) 
^(A±=fO)  ’ 


a-  <  «+, 


(9.1.8) 


will  play  an  important  role  in  what  follows.  The  value  of  determines  the  angle  at 
which  the  curve  In  t/f  (A)  “sticks”  into  the  point  (A+,  In  ^r(A+)).  The  quantity  a-  has 
a  similar  meaning.  If  a  e  [a_, «+]  then  the  equation  A'k(a,  A)=0,  or  (see  (9.1.7)) 


f\X) 

- =  a, 

f(X) 


(9.1.9) 


always  has  a  unique  solution  A  (a)  on  the  segment  [A_,  A+]  (A±  can  be  infinite). 
This  solution  A  (a),  being  the  inverse  of  an  analytical  and  strictly  increasing  function 

on  (A_,  A+)  (see  (9.1.9)),  is  also  analytical  and  strictly  increasing  on  (a_,  a+), 
X(a)  f  A+  aso'to'+;  X(a)  |  A_  aso'|o'_.  (9.1.10) 

The  equalities 


iff{X(a)) 
- =  a 


A(a)  =  aX(a)  —  \nf(X(a)), 


Vr(A(o?)) 


(9.1.11) 
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yield 


A' (a)  =  X(a)  +  aXfa) 


ir'iHa)) 


k'(a)  =  X(a). 


Recalling  that 


— — —  =mi  =E£,  0  G  [A_,  A+],  m i  G  [a_,a+], 

^(0) 

we  obtain  the  following  representation  for  the  function  A : 

(211)  If  ao  e  [ a _,  <*+],  a  g  [a_,  a+] 


pa 

A(a)  =  A(ao)  +  /  X(v)dv.  (9.1.12) 

Jao 

Since  X(m\)  =  A(m\)  =  0  (this  follows  from  (9.1.9)  and  (9.1.11)),  we  obtain, 
in  particular,  for  ao  =  m\,  that 


pa 

A(a)  =  /  X(v)dv.  (9.1.13) 

J  m  i 

The  functions  X(a)  and  A(a)  are  analytic  on  (a-,  a+). 

Now  consider  what  happens  outside  the  segment  [a-,a+].  Assume  for  definite¬ 
ness  that  A+  >  0.  We  will  study  the  behaviour  of  the  functions  X (a)  and  A(a)  near 
the  point  a+  and  for  a  >  a+.  Similar  results  hold  true  in  the  vicinity  of  the  point  a- 
in  the  case  A_  <  0. 

First  let  A+  =  oo,  i.e.  the  function  ln^(A)  is  analytic  on  the  whole  semiaxis 
A  >  0,  and  the  tail  F+(t)  decays  as  t  — >  oo  faster  than  any  exponential  function. 
Denote  by 

s±  =  ±  sup [t :  F±(t )  >  0} 

the  boundaries  of  the  support  of  F.  Without  loss  of  generality,  we  will  assume  that 

>  0,  <  0.  (9.1.14) 

This  can  always  be  achieved  by  shifting  the  random  variable,  similarly  to  our  as¬ 
suming,  without  loss  of  generality,  E§  =  0  in  many  theorems  of  Chap.  8,  where  we 
used  the  fact  that  the  problem  of  studying  the  distribution  of  Sn  is  “invariant”  with 
respect  to  a  shift.  (We  can  also  note  that  A%-a(a  —  a)  =  A%(a),  see  property  (A4) 
below,  and  that  (9.1.14)  always  holds  provided  that  E§  =  0.) 

(A2)  (i)  If  A+  =  oo  then  =  s+. 

Hence,  for  s+  =  oo,  we  always  have  ^+  =  00  and  so  for  any  a  >  a-  we  are 
dealing  with  the  already  considered  “regular”  case,  where  (9.1.12)  and  (9.1.13)  hold 
true. 
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(ii)  If  <  oo  then  A+  =  oo,  a+  =  s+, 


A(a+)  =  —  lnP(§  =  s+),  A(a)  =  oo  forcif>Gf+. 

Similar  assertions  hold  true  for  S-,  a?_,  A_. 

Proof  (i)  First  let  <  oo.  Then  the  asymptotics  of  f(X)  and  f'(X)  as  A  — >►  oo  is 
determined  by  the  integrals  in  a  neighbourhood  of  the  point  s+:  for  any  fixed  e  >  0, 

~  E(e^;  §  >  —  e),  t//(A)  ~  E(£e^;  §  >  —  e) 

as  A  — >  oo.  Hence 


r  rw 

a  +  =  lim  - 

A^OO  yjfiX) 


lim 


X^OO 


E(gg^;  g  >  s+ -  e) 
E(e^ ;  §  >  —  e) 


=  ^+. 


If  5+  =  oo,  then  Ini jr(X)  grows  as  A  — >  oo  faster  than  any  linear  function  and 
therefore  the  derivative  (lnf(X))'  increases  unboundedly,  u+  =  oo. 

(ii)  The  first  two  assertions  are  obvious.  Further,  let  p+=  P(§  =  s+)  >0.  Then 


ir{X)  ~  p+eks+, 

aX  —  lnt fr(X)  =  aX  —  In  p+  —  Xs+  +  o(  1)  =  (a  —  af+)A  —  In  p+  +  o(  1) 


as  A  — >  oo.  This  and  (9.1.11)  imply  that 


A(a)  = 


-In  p+ 
00 


for  a  =a+, 
for  a  >  «+. 


If  p+  =  0,  then  the  relation  f(X)  =  o(eXs+ )  as  A  ->  oo  similarly  implies  A{a+)  =  oo. 
Property  (A2)  is  proved.  □ 


Now  let  0  <  A+  <  oo.  If  a+  <  oo,  then  necessarily  f(X+)  <  oo,  f(X+  +0)  =  oo 
and  t/F(A+)  <  oo  (here  we  mean  the  left  derivative).  If  we  assume  that  f(X+)  =  oo, 
then  In  f(X+)  =  oo,  (In  f(X))'  — >  oo  as  A  f  A+  and  0?+  =  oo,  which  contradicts  the 
assumption  a+  <  oo.  Since  f(X)  =  oo  for  A  >  A+,  the  point  A  (a),  having  reached 
the  value  A+  as  a  grows,  will  stop  at  that  point.  So,  for  a  >  a+,  we  have 

A(of)  =  A+,  A(a)  =  aX+  —  ln^(A+)  =  A(a+)  +  A+(a  —  a+).  (9.1.15) 

Thus,  in  this  case,  for  a  >  a+  the  function  A  (a)  remains  constant,  while  At  (a)  grows 
linearly.  Relations  (9.1.12)  and  (9.1.13)  remain  true. 

If  a  +  =  oo,  then  a  <  <*+  for  all  finite  a  >  <*_,  and  we  again  deal  with  the  “regu¬ 
lar”  case  that  we  considered  earlier  (see  (9.1.12)  and  (9.1.13)).  Since  A  (of)  does  not 
decrease,  these  relations  imply  the  convexity  of  At  (a). 

In  summary,  we  can  formulate  the  following  property. 
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(A3)  The  functions  X(a)  and  A(a)  can  only  he  discontinuous  at  the  points  s± 
and  under  the  condition  P(§  =  s±)  >  0.  These  points  separate  the  domain 
(s-,  s+)  where  the  function  A  is  finite  and  continuous  (in  the  extended  sense) 
from  the  domain  a  £  [s-,  s+]  where  A  (a)  =  oo.  In  the  domain  \s-,  s+]  the 
function  A  is  convex.  (If  we  define  convexity  in  the  “extended”  sense,  i.e. 
including  infinite  values  as  well,  then  A  is  convex  on  the  entire  real  line.) 
The  function  A  is  analytic  in  the  interval  (a-,  «+).  If  A+  <  oo  and  a+  <  oo, 
then  on  the  half-line  (a+,  oo)  the  function  A(a)  is  linear  with  slope  A+;  at  the 
boundary  point  a+  the  continuity  of  the  first  derivatives  persists.  If  A+  =  oo, 
then  A  (a)  =  oo  on  («+,  oo).  The  function  A  (a)  possesses  a  similar  property 
on  (—oo,  a-). 

If  A_  =  0,  then  a?_  =  m\  and  A  (of)  =  A  (a)  =  0  for  a  <  m  \ . 

Indeed,  since  X(m\)  =  0  and  fi(X)  =  oo  for  X  <  A_  =  0  =  X(m\),  as  the  value 
of  a  decreases  to  a-  =  m\,  the  point  X(a),  having  reached  the  value  0,  will  stop, 
and  A  (a)  =  0  for  a  <  a-  =m  i .  This  and  the  first  identity  in  (9.1.1 1)  also  imply  that 
A  (a)  =  0  for  a  <  m  \ . 

IfA_=A+=0  (condition  [C]  is  not  met),  then  A  (a)  =  A  (a)  =  0  for  all  a.  This 
is  obvious,  since  the  value  of  the  function  under  the  sup  sign  in  (9.1.6)  equals  —  oo 
for  all  A  0.  In  this  case  the  limit  theorems  presented  in  the  forthcoming  sections 
will  be  of  little  substance. 

We  will  also  need  the  following  properties  of  the  function  A. 

(A4)  Under  obvious  notational  conventions,  for  independent  random  variables  § 
and  rj,  we  have 

A^+r](a)  =  sup  (a  A  -  lni/^(A)  -  ln^(A))  =  inf  (A%(y)  +  Av  (a  -  y)), 

x  y 

Ac^+b(a)  =  sup(ofA  —  A  b  —  ln^(Ac))  =  A% 

k 

Clearly,  infK  in  the  former  relation  is  attained  at  the  point  y  at  which  A %(y)  = 
Xr] (&  ~  y).  If  §  and  r/  are  identically  distributed  then  y  =  a/2  and  therefore 

(a\  ( a\  ( a\ 

2  J  +  Ariyfj  =2A^\2j' 

(215)  The  function  A(a)  attains  its  minimal  value  0  at  the  point  a  =  E§  =m\.  For 
definiteness,  assume  that  a+  >  0.  If  m\  =0  and  E|§^|  <  oo,  then 

A(0)  =  yl(0)  =  yl/(0)  =  0,  A"(0)  =  — ,  Am(0)  =  ~A>  ••• 

K2  Yi 

(9.1.16) 

(In  the  case  a_  =  0  the  right  derivatives  are  intended.)  As  a  |  0,  one  has  the 
representation 


a 


—  b 


c 
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A{a)  =  2_j - —aJ+o(ak).  (9.1.17) 

7=2  j ! 

The  semi-invariants  yj  were  defined  in  (9.1.2)  and  (9.1.3). 

If  the  two-sided  Cramer  condition  is  satisfied  then  the  series  expansion  (9.1.17) 
of  the  function  A(a)  holds  for  k  =  oo.  This  series  is  called  the  Cramer  series. 
Verifying  properties  (A4)  and  (215)  is  not  difficult,  and  is  left  to  the  reader. 

(216)  The  following  inversion  formula  is  valid:  for  X  e  (A_,  A+), 

\nfi(X)  =  sup(aX  —  A(a)y  (9.1.18) 

a 

This  means  that  the  rate  function  uniquely  determines  the  Laplace  transform  fifX) 
and  hence  the  distribution  F  as  well.  Formula  (9.1.18)  also  means  that  subsequent 
double  applications  of  the  Legendre  transform  to  the  convex  function  lnfi(X)  leads 
to  the  same  original  function. 

Proof  We  denote  by  T (X)  the  right-hand  side  of  (9.1.18)  and  show  that  T  (X)  = 
ln\//(X)  for  2  e  (A_,  A+).  If,  in  order  to  find  the  supremum  in  (9.1.18),  we  equate 
to  zero  the  derivative  in  a  of  the  function  under  the  sup  sign,  then  we  will  get  the 
equation 

X  =  A'(a)  =  X(a).  (9.1.19) 

Since  A(o'),  a  e  (a_,o'+),  is  the  function  inverse  to  (lnfiX))'  (see  (9.1.9)),  for 
X  6  (A_,  2+)  Eq.  (9.1.19)  clearly  has  the  solution 

a  =  a(X)  :=  (ln^(A.))/.  (9.1.20) 

Taking  into  account  the  fact  that  X(a{X))  =  X,  we  obtain 

T(X)  =  Xa(X)  -  A(a(X)), 

TfiX)  =  a(X)  +  Xa'{X)  -  X{a(X))a' (X)  =  a(X). 

Since  a( 0)  =  m\  and  T (0)  =  —A(m\)  =  0,  we  have 

T (X)  =  (  a(u) du  =  \n\l/(X).  (9.1.21) 

Jo 

The  assertion  is  proved,  and  so  is  yet  another  inversion  formula  (the  last  equality 
in  (9.1.21),  which  expresses  In  fi(X)  as  the  integral  of  the  function  a(X)  inverse  to 
X(a)).  □ 

(217)  The  exponential  Chebyshev  inequality.  For  a  >  mi,  we  have 

P (Sn  >  an)  <  e 


nA(a) 
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Proof  If  a  >  mi,  then  A(aO  >  0.  For  A  =  A  (a)  >  0,  we  have 

fn{X)  >  E(exs";  Sn  >  an)  >  eXanP  (S„  >  an); 
p(5  >  <  g-aflMaO+nlnVKMa))  _  e~nA(a) 


□ 


We  now  consider  a  few  examples,  where  the  values  of  X±,a±,  and  the  functions 
i/f  (A),  A  (a),  A  (a)  can  be  calculated  in  an  explicit  form. 


Example  9.1.1  If  §  ^  4>o,i,  then 


^(A)  =  e 


-  >2/2 


|A±|  =  \ct±\  =  oo,  A(o')=o', 


A(a)  = 


a ' 


Example  9.1.2  For  the  Bernoulli  scheme  §  ^  Bp,  we  have 


f(X)  =  pe  +  q,  |A±|  =  oo,  a+  =  l,  a_  =  0,  m\=E^  =  p, 

a(l  —  p)  a  1  —  a 

A  (a)  =  In - ,  A(a)=a  In — |- (1  —  a)  In -  for  a  g  (0,  1), 

p(  1  —  a)  p  1  —  p 

yl(0)  =  —  ln(l  —  p),  A(l)  =  —lnp,  A  (a)  =  oo  fora^[0,  1]. 


Thus  the  function  H(a)  =  A  (a),  which  described  large  deviation  probabilities  for 
Sn  in  the  local  Theorem  5.2.1  for  the  Bernoulli  scheme,  is  nothing  else  but  the  rate 
function.  Below,  in  Sect.  9.3,  we  will  obtain  generalisations  of  Theorem  5.2.1  for 
arbitrary  arithmetic  distributions. 


Example  9.1.3  For  the  exponential  distribution  Tp,  we  have 


1 

A  +  = /3,  A_  =  — oo,  =  oo,  oi—  —  0,  m\  =  — , 

p 

A(a)  =  af>  —  1  —  \na/3  for  a  >  0. 

Example  9.1.4  For  the  centred  Poisson  distribution  with  parameter  ft,  we  have 

f(X)  =  exp{/3[<^  —  1  —  A] },  |A±|  =  oo,  a-  =  —  /3,  a+  =  oo,  m  i  =  0, 

B  T  a  a  T  B 

Hot)  =  In — — ,  A(a)  =  (a  +  P)  In — - 

P  P 


= 


-A 

Hot)  =  p  -  -, 
a 


a  for  a  >  —/3. 
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9.2  A  Relationship  Between  Large  Deviation  Probabilities  for 
Sums  of  Random  Variables  and  Those  for  Sums  of  Their 
Cramer  Transforms.  The  Probabilistic  Meaning  of  the  Rate 
Function 

9.2.1  A  Relationship  Between  Large  Deviation  Probabilities  for 
Sums  of  Random  Variables  and  Those  for  Sums  of  Their 
Cramer  Transforms 


Consider  the  Cramer  transform  of  F  at  the  point  A  =  X(a)  for  a  e  [a_,  a+]  and 
introduce  the  notation  :=  $(x(a))> 

n 

c (ot)  V"  fc(a) 
i  =  1 

where  are  independent  copies  of  The  distribution  F^  :=  F(^(a))  of  the 
random  variable  is  called  the  Cramer  transform  of  F  with  parameter  a.  The 

random  variables  are  also  called  Cramer  transforms,  but  of  the  original  random 

variable  §.  The  relationship  between  the  distributions  of  Sn  and  is  established 
in  the  following  assertion. 


Theorem  9.2.1  For  x  =  na,  a  e  ( a _,  a+),  and  any  f  >  0,  one  has 
F(Sn  e[x,x  +  t))  =  e~nA{a)  [  e~Ha)ziP (S^a)  -  an  e  dz) 

Jo 


(9.2.1) 


Proof  The  Laplace  transform  of  the  distribution  of  the  sum  ^  is  clearly  equal  to 


t /r(/X  +  A(0f)) 
^(A(O'))  _ 


(9.2.2) 


(see  (9.1.5)).  On  the  other  hand,  consider  the  Cramer  transform  (Sn)(x( a))  of  Sn  at 
the  point  A  (a).  Applying  (9.1.5)  to  the  distribution  of  Sn,  we  obtain 


Ee»(SnX,m  =  nil+Hcc))' 

t lrn(X(a)) 


Since  this  expression  coincides  with  (9.2.2),  the  Cramer  transform  of  Sn  at  the 
point  X(a)  coincides  in  distribution  with  the  sum  of  the  transforms  ^a\  In 
other  words, 


P (Sn  e  dv)e^a)v 


P(V  e  dv) 


(9.2.3) 
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or,  which  is  the  same, 


P  (Sn  e  dv)  =  e-Ua)v+n\nf(X(a))p^s(a)  £  ^  =  r^(«)+i(a)(M-»)p|5(«)  £  <f„). 

Integrating  this  equality  in  v  from  x  to  x  +  t,  letting  x  :=na  and  making  the  change 
of  variables  v  —  na  =  z,  we  get 


px-\-t 

P(S„  £  [x,  X  +  t))  =  e~nMa)  /  eUa)(na-v)p(s(a)  £  dyj 

J  X 


_  e~nA(a) 


The  theorem  is  proved. 


□ 


Since  for  a  e  [a_, «+]  we  have 


E|W  =  »Uw»=a 

ir(X(a)) 


(see  (9.1.11)),  one  has  E(5„  —  an)  =  0  and  so  for  t  <  c*Jn  we  have  probabilities 

of  normal  deviations  of  —  an  on  the  right-hand  side  of  (9.2.1).  This  allows  us  to 
reduce  the  problem  on  large  deviations  of  Sn  to  the  problem  on  normal  deviations 
of  sjf  K  If  a  >  a+,  then  formula  (9.2.1)  is  still  rather  useful,  as  will  be  shown  in 
Sects.  9.4  and  9.5. 


9.2.2  The  Probabilistic  Meaning  of  the  Rate  Function 

In  this  section  we  will  prove  the  following  assertion,  which  clarifies  the  probabilistic 
meaning  of  the  function  A  (a). 

Denote  by  A[a)  :=  [a,  a  +  A)  the  interval  of  length  A  with  the  left  end  at 
the  point  a.  The  notation  An[a ),  where  An  depends  on  n ,  will  have  a  similar  mean¬ 
ing. 

Theorem  9.2.2  For  each  fixed  a  and  all  sequences  An  converging  to  0  as  n  — >  oo 
slowly  enough ,  one  has 

A(a)  =  —  lim  -  lnP(  —  €  An[a)  |.  (9.2.4) 

n^oo  n  V  n  ) 


This  relation  can  also  be  written  as 

p(—  e  A, [a))  =  e-nA(a)+o(n) . 
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Proof  of  Theorem  9.2.2  First  let  a  e  (o'-,  «+).  Then 


Ef(a)  =  a, 


Var^(ff)  =  (ln^(A.))"= 


X(a) 


<  00 


and  hence,  as  n  — >  oo  and  — >►  0  slowly  enough  (e.g.,  for  >  n  1/3),  by  the 
central  limit  theorem  we  have 

p$a)  —  an  e  [0,  a„n))  ->  1/2. 

Therefore,  by  Theorem  9.2.1  for  f  =  x  =  an  and  by  the  mean  value  theorem, 

P (Sn  e  [x,x  +0)  =  Q  +o(l)j  0  €  (Q,  1); 

-  lnP(Sn  e  [v,  v  +  O)  =  —  A  (a)  —  A(of)£Mw  +  o(  1)  =  — tI(q')  +  <?(1) 
n 

as  rc  — >  oo.  This  proves  (9.2.4)  for  a  e  (o?_,  o?+). 

The  further  proof  is  divided  into  three  stages. 

(1)  The  upper  bound  in  the  general  case.  Now  let  a  be  arbitrary  and  |A(a)|  <  oo. 
By  Theorem  9.2.1  for  t  =nAn,  we  have 


m 


n 


G  An[a)  J  <  exp{—  nA{a)  +  max(|A(0)|,  |A(a)|)/i2Aw}. 


If  ZA 


n 


0  then 


1 

lim  sup  -  In  P  | 

n^oo  n 


’n 


n 


G  An[a)  <  -A(a), 


(9.2.5) 


(This  inequality  can  also  be  obtained  from  the  exponential  Chebyshev’s  inequal¬ 
ity  (217).) 

(2)  The  lower  bound  in  the  general  case.  Let  |A(o?)|  <  oo  and  \s±\  =  oo.  Intro¬ 
duce  “truncated”  random  variables  with  the  distribution 

P  (JV)?  €  B  =  — - — - -=P UeB  \%\<N) 

v  ;  P(|§|  <  N)  v  ; 

and  endow  all  the  symbols  that  correspond  to  with  the  left  superscript  (N). 
Then  clearly,  for  each  A, 

E(e^;|§|<lV)t^(A),  P(|$|<JV)fl 


as  N  — >  oo,  so  that 


(AO, 


f(X)  = 


E(eA?;  |t|  <N) 


m\<N) 


VKV). 
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The  functions  (7V)21(q')  and  A(a)  are  the  upper  bounds  for  the  concave  functions 
aX  —  In  (7V)t Jr(X)  and  aX  —  ln\//(X),  respectively.  Therefore  for  each  a  we  also  have 
convergence  ->  A  (a)  as  N  — >  oo. 

Further, 


P 


G 


>  P 


-X  e  An[oc)\  \%j\  <  N,  j  =  l, . Nj 


=  P"  (|^|  <  N)P 


G 


Since  s±  =  ±oo,  one  has  ^N)a±  =  ±./V  and,  for  N  large  enough,  we  have  a  G 
((7V)o'_,  ^a+).  Hence  we  can  apply  the  first  part  of  the  proof  of  the  theorem  by 
virtue  of  which,  as  An  — >  0, 

1  /(N)c  \ 

-  InPl - -  e  An[a) )  =  -{N)A(a)  +  o(  1), 

n  \  n  J 

1  /S* 

—  InPl  -g4[«) 

/i  \  n 

The  right-hand  side  of  the  last  inequality  can  be  made  arbitrarily  close  to  —  A  (a)  by 
choosing  a  suitable  N.  Since  the  left-hand  side  of  this  inequality  does  not  depend 
on  V,  we  have 

liminf-  lnP|  —  G  An[a)  )  >  —A(a).  (9.2.6) 

n^oo  n  \n  J 

Together  with  (9.2.5),  this  proves  (9.2.4). 

(3)  It  remains  to  remove  the  restrictions  stated  at  the  beginning  of  stages  (1)  and 
(2)  of  the  proof,  i.e.  to  consider  the  cases  |A(o?)|  =  oo  and  min|,s±|  <  oo.  These 
two  relations  are  connected  with  each  other  since,  for  instance,  the  equality  A  (a?)  = 
A+  =  oo  can  only  hold  if  a  >  a+  =  <  oo  (see  property  (A2)).  For  a  >  s+, 

relation  (9.2.4)  is  evident,  since  P  (Sn/n  G  An[a))  =  0  and  A  (a)  =  oo.  For  a  = 
<2+  =  v+  and  p+  =  P(§  =  s+),  we  have,  for  any  A  >  0, 

p(^~  e  zA[a+)  j  =  P(S„  =  na+)  =  p\.  (9.2.7) 

Since  in  this  case  A(a+)  =  —  In p+  (see  (A2)),  the  equality  (9.2.4)  holds  true. 

The  case  X(a)  =  X-  =  —oo  with  >  —  oo  is  considered  in  a  similar  way.  How¬ 
ever,  due  to  the  asymmetry  of  the  interval  A[a)  with  respect  to  the  point  a ,  there 
are  small  differences.  Instead  of  an  equality  in  (9.2.7)  we  only  have  the  inequality 

P\^-eAn[a-)\>P(Sn=na-)  =  pn_,  p- =  P(§  =<*_).  (9.2.8) 


j  >  _Wyl(a)+0(i)  +  inp(|^|  <Ny 
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Therefore  we  also  have  to  use  the  exponential  Chebyshev’s  inequality  (see  ( A1 )) 
applying  it  to  —Sn  for  s-  =  a-  <  0: 

p(^-  €  <  a_  +  A„)  <  e~nA^a-+A"\  (9.2.9) 

Relations  (9.2.8),  (9.2.9),  the  equality  A(ct-)  =  —  ln/?_,  and  the  right  continuity  of 
A(a)  at  the  point  ot-  imply  (9.2.4)  for  a— a-.  The  theorem  is  proved.  □ 


9.2.3  The  Large  Deviations  Principle 

It  is  not  hard  to  derive  from  Theorem  9.2.2  a  corollary  on  the  asymptotics  of  the 
probabilities  of  Sn/n  hitting  an  arbitrary  Borel  set.  Denote  by  ( B )  and  [ B ]  the 
interior  and  the  closure  of  B ,  respectively  ((B)  is  the  union  of  all  open  intervals 
contained  in  B).  Put 

A(B)  :=  inf  A(a). 

aeB 


Theorem  9.2.3  For  any  Borel  set  B ,  the  following  inequalities  hold : 

liminf-  lnP(  —  e  B  )  >  -A((B)),  (9.2.10) 

n^oo  n  \n  ) 

limsup  —  lnP|  —  e5)<  —^([5]).  (9.2.1 1) 

n — >oo  n  \  n  ) 

If  A  ((B))  =  A  ([B]),  then  the  following  limit  exists : 

lim  -  lnPj  -  e5|  =  ~A(B).  (9.2.12) 

n^o o  n  \n  ) 

This  assertion  is  called  the  large  deviation  principle.  It  is  one  of  the  so-called 
‘Tough”  (“logarithmic”)  limit  theorems  that  describe  the  asymptotic  behaviour  of 
InP (Sn/n  e  B).  It  is  usually  impossible  to  derive  from  this  assertion  the  asymp¬ 
totics  of  the  probability  P (Sn/n  e  B)  itself.  (In  the  equality  P (Sn/n  e  B)  = 
exp {—nA(B)  +  o(n)},  the  term  o(n)  may  grow  in  absolute  value.) 

Proof  Without  losing  generality,  we  can  assume  that  B  C  [v_ ,  (since  A(a)  =  oo 
outside  that  domain). 

We  first  prove  (9.2.10).  Let  a^B)  be  such  that 

A((B))=  inf  A(a)  =  A(a(B)) 

OL  E  (5) 

(recall  that  A  (a)  is  continuous  on  [s-,  5’+]).  Then  there  exist  a  sequence  of  points 
ctk  and  a  sequence  of  intervals  (oik  —  8k,  oik  +  8k),  where  8k  — >  0,  lying  in  (B)  and 
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converging  to  the  point  cq#),  such  that 

A((B))  =  inf  A({ak  -  Sk,  ak  +  8k)). 

k 


Here  clearly 

inf  A({ak  -  4,  ak  +  4))  =  inf  A(ak), 

k  k 

and  for  a  given  s  >  0,  there  exists  a  k  =  K  such  that  A(cxk)  <  A((B))  +  s. 
Since  An [ak)  C  (cq  —  8k,ak  +  8^)  for  large  enough  n  (here  An[cik)  is  from  Theo¬ 
rem  9.2.2),  we  have  by  Theorem  9.2.2  that,  as  n  ->  oo, 

-  lnpf  -g5)>-  lnpf  —  €  (5)) 

n  \ n  )  n  \n  ) 

>  -  lnP(  —  g  ( olk  —8k,<xk+  Sk)  J 

n  \n  J 

>  -  \nY*(—  g  An[aK)\  >  —A(cik)  +  0(1) 

n  \n  J 

>-A({B))-s  +  o(  1). 

As  the  left-hand  side  of  this  inequality  does  not  depend  on  e,  inequality  (9.2.10)  is 
proved. 

We  now  prove  inequality  (9.2.11).  Denote  by  cq# ]  the  point  at  which 
infaG[5]  A(a)  =  A(a[B])  is  attained  (this  point  always  belongs  to  [ B ]  since  [B] 
is  closed).  If  A(cq#])  =  0,  then  the  inequality  is  evident.  Now  let  A(cq#])  >  0.  By 
convexity  of  A  the  equation  A(a)  =  A(cq#])  can  have  a  second  solution  aj ' By  As¬ 
sume  it  exists  and,  for  definiteness,  <  cq#].  The  relation  A([B])  =  A(c q#]) 
means  that  the  set  [B]  does  not  intersect  with  (o^,  cq#])  and 

p(  V s)  -  p(^ £  [SI)  -  p(t  - "™) + p(l  ■  (9il3) 

Moreover,  in  this  case  m\  G  (c^,  cq#])  and  each  of  the  probabilities  on  the  right- 
hand  side  of  (9.2.13)  can  be  bounded  using  the  exponential  Chebyshev’s  inequality 
(see  (A7))  by  the  value  e~nA(y(X^\  This  implies  (9.2.11). 

If  the  second  solution  does  not  exist,  then  one  of  the  summands  on  the  right- 
hand  side  of  (9.2.13)  equals  zero,  and  we  obtain  the  same  result. 

The  second  assertion  of  the  theorem  (Eq.  (9.2.12))  is  evident. 

The  theorem  is  proved.  □ 

Using  Theorem  9.2.3,  we  can  complement  Theorem  9.2.2  with  the  following 
assertion. 
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Corollary  9.2.1  The  following  limit  always  exists 


A(a). 


(9.2.14) 


Proof  Take  the  set  B  in  Theorem  9.2.3  to  be  the  interval  B  =  A[a).  If  a  £  [s_,  s+] 
then  the  assertion  is  obvious  (since  both  sides  of  (9.2.14)  are  equal  to  — oo).  If 
a  —  s±  then  (9.2.14)  is  already  proved  in  (9.2.7),  (9.2.8)  and  (9.2.9). 

It  remains  to  consider  points  a  e  (s-,  5+).  For  such  a ,  the  function  A  (a)  is  con¬ 


tinuous  and  a  +  A  is  also  a  point  of  continuity  of  A  for  A  small  enough,  and  hence 


yl((5))  =  yl([B])^a(«) 


as  A  — >  0.  Therefore  by  Theorem  9.2.3  the  inner  limit  in  (9.2.14)  exists  and  con¬ 


verges  to  —  A  (a)  as  A  — >►  0. 
The  corollary  is  proved. 


□ 


Note  that  the  assertions  of  Theorems  9.2.2  and  9.2.3  and  their  corollaries  are 
“universal” — they  contain  no  restrictions  on  the  distribution  F. 


9.3  Integro-Local,  Integral  and  Local  Theorems  on  Large 
Deviation  Probabilities  in  the  Cramer  Range 

9.3.1  Integro-Local  and  Integral  Theorems 

In  this  subsection,  under  the  assumption  that  the  Cramer  condition  >  0  is  met, 
we  will  find  the  asymptotics  of  probabilities  P  (Sn  G  A[x ))  for  scaled  deviations  a  = 
x/n  from  the  so-called  Cramer  (or  regular )  range ,  i.e.  for  the  range  a  e  (o?_,  q?+) 
in  which  the  rate  function  A  (a)  is  analytic. 

In  the  non-lattice  case,  in  addition  to  the  condition  >  0,  we  will  assume  with¬ 
out  loss  of  generality  that  E§  =  0.  In  this  case  necessarily 


a-  <  0, 


The  length  A  of  the  interval  may  depend  on  n  in  some  cases.  In  such  cases,  we  will 
write  An  instead  of  A,  as  we  did  earlier.  The  value 


(9.3.1) 


is  clearly  equal  to  Var(£^)  (see  (9.1.5)  and  the  definition  of  in  Sect.  9.2). 
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Theorem  9.3.1  Let  A+  >  0,  a  G  [0,0?+),  §  be  a  non-lattice  random  variable , 
E§=0  and  E§2  <  oo.  If  An  — >  0  slowly  enough  as  n  ->  oo,  /feo 

P(V  e  An[x))  =  e~^W(i  +0(1)),  (9.3.2) 

cf  qi  2  Tt  n 

where  a  =  x/n,  and,  for  each  fixed  a\  G  (0,  a+),  remainder  term  o(  1)  A  uniform 
in  a  G  [0,  a  \]  for  any  fixed  ct\  G  (0,  «+). 

A  similar  assertion  is  valid  in  the  case  when  A_  <  0  and  a  G  (o'-,  0]. 


Proof  The  proof  is  based  on  Theorems  9.2.1  and  8.7.1  A.  Since  the  conditions  of 
Theorem  9.2.1  are  satisfied,  we  have 

P(S„  e  An[x))  =  e~nA(a)  f  "  e~X(a)zV (S^  -ane  dz). 

Jo 

As  A  (a?)  <  A(a+  —  s)  <  oo  and  An  —>  0,  one  has  e~k^z  — >  1  uniformly  in 
z  G  An[ 0)  and  hence,  as  n  — >  oo, 

P(S„  e  4.  DO)  =r"AWP(5<“)  -  on  e  a„[0))(l  +  o(l))  (9.3.3) 


uniformly  in  a  g  [0,  u+  —  s]. 

We  now  show  that  Theorem  8.7.1  A  is  applicable  to  the  random  variables  = 
$(X(a))-  That  cr^  =  cr(X (a))  is  bounded  away  from  0  and  from  oo  for  a  G  [0,  c^i]  is 
evident.  (The  same  is  true  of  all  the  theorems  in  this  section.)  Therefore,  it  remains 
to  verify  whether  conditions  (a)  and  (b)  of  Theorem  8.7.1  A  are  met  for  A  =  A  (a)  g 
[0,  Ai],  Ai  :=  A(o?i)  <  A+  and  <; P(X)(t )  =  (see  (9.1.5)).  We  have 

t2 

\j/{X  +  it)  =  fi(X)  +  itfi'(  A) - \frrf  (A)  +  o(t 2) 

2 

as  £  — >  0,  where  the  remainder  term  is  uniform  in  A  if  the  function  +  iu)  is 
uniformly  continuous  in  u.  The  required  uniform  continuity  can  easily  be  proved 
by  imitating  the  corresponding  result  for  ch.f.s  (see  property  4  in  Sect.  7.1).  This 
proves  condition  (a)  in  Theorem  8.7.1  A  with 


a  (A)  = 


rw 

VKA)’ 


072(A)  = 


Now  we  will  verify  condition  (b)  in  Theorem  8.7.1  A.  Assume  the  contrary:  there 
exists  a  sequence  A^  g  [0,  Ai]  such  that 


qxk  :=  sup 

0i<id<02 


I  &k  +  10 1 

fifkk) 


1 
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as  k  — >  oo.  By  the  uniform  continuity  of  in  that  domain,  there  exist  points 
4  G  [0i ,  02]  such  that,  as  k  — >  oo, 


+  z’4) 


Since  the  region  A  e  [0,  Ai],  |f|  g  [0i,02]  is  compact,  there  exists  a  subsequence 
(A*/,  40  — >  (Ao,  4)  as  kr  — >  oo.  Again  using  the  continuity  of  we  obtain  the 
equality 


I  VKAo  +  i4)l 
^(Ao) 


(9.3.4) 


which  contradicts  the  non-latticeness  of  $(x0).  Property  (b)  is  proved. 

Thus  we  can  now  apply  Theorem  8.7.1  A  to  the  probability  on  the  right-hand  side 

of  (9.3.3).  Since  =  a  and  E(§^)2  =  ,  this  yields 

P(V  e  a„[jc))  =  + 

=  — e-"^(“)(i  +  0(l))  (9.3.5) 

cra  /\  2  Tt  Yl 


uniformly  in  of  e  [0,  a\]  (or  in  v  e  [0,  oq n]),  where  the  values  of 


f"(M Op) a2 


are  bounded  away  from  0  and  from  oo.  The  theorem  is  proved. 


□ 


From  Theorem  9.3.1  we  can  now  derive  integro-local  theorems  and  integral  the¬ 
orems  for  fixed  or  growing  A.  Since  in  the  normal  deviation  range  (when  v  is  com¬ 
parable  with  y/n)  we  have  already  obtained  such  results,  to  simplify  the  exposition 
we  will  consider  here  large  deviations  only,  when  v  ^  y/n  or,  which  is  the  same, 
a  =  x/n  1  /y/n.  To  be  more  precise,  we  will  assume  that  there  exists  a  function 
N  (n)  —>  oo,  N(n )  =  o(y/n)  as  n  —>  oo,  such  that  v  >  N(n)y/n  ( a  >  N(n)/ y/n). 

Theorem  9.3.2  Let  A+  >  0,  a  g  [0,  o?_|_),  §  be  non-lattice ,  E§  =  0  and  E§2  <  oo. 
Then,  for  any  A  >  Ao  0,  v  ^  // (w)  —  o ( yj n ),  V(w)  ^  oo  as  n  ^  oo,  one  has 

g—nA(a) 

p (Sn  e  4[JC))  = -  (1  +0(1)),  (9.3.6) 

< 7ak(a)w27tn 

o(  1)  uniform  in  a  =  x/n  G  [N(n)/y/n,ot i]  and  A  >  Aq  for  each  fixed 

a i  G  (0,  of+). 

In  particular  (for  A  =  oo), 


g—nA(a) 

P (Sn  >X)  =  - =(1+0(1)). 

oOLX(oi)\l2nn 


(9.3.7) 
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Proof  Partition  the  interval  A[x)  into  subintervals  An[x  +  kAn ),  k  =  0, . . . , 
A/An  —  1,  where  An  — >  0  and,  for  simplicity,  we  assume  that  M  =  A/An  is  an 
integer.  Then,  by  Theorem  9.2.1,  as  — >►  0, 


G  +  &An)) 

=  P (Sn  C  [v,  V  +  (k  +  l)An))  —  P (Sn  G  [v,  V  +  &An)) 
r  (k-\-l)An 

=  e~nA(a)  /  g-M«)zp(s(«)  _ane  dz ) 

J  kAn 


=  e-nA(a)-\(a)kAnp(s(a)  _  an  g  (l  +  o(l)) 


(9.3.8) 


uniformly  in  of  g  [0,  o?i].  Here,  similarly  to  (9.3.5),  by  Theorem  8.7.1  A  we  have 

P (S<“}  -  an  e  zi„  [£/!„))  =  -2^(22^)  +  0(-L)  (9.3.9) 

O’afV'1  \°W^/  VvW 

uniformly  in  k  and  a.  Since 


M—l 

P(Sn  e  A[x))  =  P(V  e  An[x  +  kA )), 

k= 0 

substituting  the  values  (9.3.8)  and  (9.3.9)  into  the  right-hand  side  of  the  last  equality, 
we  obtain 


P(V  €  A[x)) 


+  0(1) 


+  0(1) 


(9.3.10) 


After  the  variable  change  X(a)z  =  u,  the  right-hand  side  can  be  rewritten  as 


,—nk(  a) 


craX(a)^n  J0 


l 


(A-An)k(o 0 


—u 


0 


U 


w  ,  r~  I  +0{l)\du, 

craX(a)^/n  J  J 


(9.3.11) 


where  the  remainder  term  o(  1)  is  uniform  in  of  g  [0,  afi],  A  >  A o,  and  u  from  the 
integration  range.  Since  X (a)  ~  a/cr 2  for  small  a  (see  (9.1.12)  and  (9.1.16)),  for 
a  >  N(n)/yfn  we  have 


w  ,  N(n)  (  ^  r  aotN{n) 

X(a)  >  0  r-  (1  +  o(  1)),  Gak(a)y/n  >  ^ — 


oo. 


o 


Therefore,  for  any  fixed  u ,  one  has 


0 


aa\(a)y/n 
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Moreover,  cp(v)  <  for  all  v.  Hence,  by  (9.3.10)  and  (9.3.11), 


P(V  e  A\x))  = 


,—nA(a ) 


<7, 


a 


X(a)V2 

£>—nA{a) 


i xn  j o 


nX{a)A 

I  e~udu{\  +  <?(1)) 

Jo 


<7, 


a 


X(a)V2 


nn 


(1  _  e~Ha)A)(l  +  0(1)) 


uniformly  in  a  e  [0,  a\]  and  A  >  Aq.  Relation  (9.3.7)  clearly  follows  from  (9.3.6) 
with  A  =  oo.  The  theorem  is  proved.  □ 

Note  that  if  E|§|  k  <  oo  (for  >  0  this  is  a  restriction  on  the  rate  of  decay  of  the 
left  tails  P(§  <  —  t),  t  >  0),  then  expansion  (9.1.17)  is  valid  and,  for  deviations  v  = 
o(n)  (q'  =  6>(1))  such  that  nak  =  xk /nk~l  <c  =  const,  we  can  change  the  exponent 
nA(a)  in  (9.3.6)  and  (9.3.7)  to 


n 


a(^(0)  ,  , 

A(a)  =  n  2_ ^ - - — ocJ  +  o(naK), 


7=2 


J 


(9.3.12) 


where  0)  are  found  in  (9.1.16).  For  k  =  3,  the  foregoing  implies  the  following. 


Corollary  9.3.1  Let  >  0,  E|§|3  <  oo,  §  be  non-lattice ,  E§  =  0,  E§2  =  a2, 
x^>^fn  and  x  =  o(n 2//3)  as  n  ->  oo.  Then 


o  Jn 

P (Sn  >x)^  — —  exp^ 
x\/2ix 


X‘ 


2  no1 


0 


(9.3.13) 


In  the  last  relation  we  used  the  symmetry  of  the  standard  normal  law,  i.e.  the 
equality  1  —  @(t)  =  &(—t).  Assertion  (9.3.13)  shows  that  in  the  case  >  0  and 
Em  3  <  oo  the  asymptotic  equivalence 


P (Sn  >  *)  ~  <Z> 


persists  outside  the  range  of  normal  deviations  as  well,  up  to  the  values 
x  =  o(n2A-  If  Ef 3  =  0  and  E§4  <  oo,  then  this  equivalence  holds  true  up  to  the 
values  v  =  o(n 3//4).  For  larger  v  this  equivalence,  generally  speaking,  no  longer 
holds. 


Proof  of  Corollary  9.3.1  The  first  relation  in  (9.3.13)  follows  from  Theorem  9.3.2 
and  (9.3.12).  The  second  follows  from  the  asymptotic  equivalence 

oo  e-x2/2 

e  2  du  ~ - , 

v 

which  is  easy  to  establish,  using,  for  example,  T Hospital’s  rule.  □ 


9.3  Large  Deviation  Probabilities  in  the  Cramer  Range 


261 


9.3.2  Local  Theorems 


In  this  subsection  we  will  obtain  analogues  of  the  local  Theorems  8.7.2  and  8.7.3  for 
large  deviations  in  the  Cramer  range.  To  simplify  the  exposition,  we  will  formulate 
the  theorem  for  densities,  assuming  that  the  following  condition  is  satisfied: 

[D]  The  distribution  F  has  a  bounded  density  f(x)  such  that 

f(x)  =  e~^+x^0^  as  v  — >►  oo,  if  A+  <  oo;  (9.3.14) 

f(x)<ce~Xx  for  any  fixed  k  >  0,  c  =  c(k),  ifA+  =  oo.  (9.3.15) 

Since  inequalities  of  the  form  (9.3.14)  and  (9.3.15)  always  hold,  by  the  exponen¬ 
tial  Chebyshev  inequality,  for  the  right  tails 

poo 

F+(x)=  /  f(u)du, 

J  X 

condition  [D]  is  not  too  restrictive.  It  only  eliminates  sharp  “bursts”  of  f(x)  as 

V  —>  00. 

Denote  by  fn(x)  the  density  of  the  distribution  of  Sn. 


Theorem  9.3.3  Let 

E£  =  0,  E§2  <  oo,  >  0,  a  =  —  e  [0,  cl+), 

n 

and  condition  [D]  be  met.  Then 

e-nA(a ) 

fn(x)  =  - -==  (1+0(1)), 

cf /\  2.7V  n 

where  the  remainder  term  6>(1)  is  uniform  in  a  E  [0,  a\]for  any  fixed  a\  E  (0,  of+). 


Proof  The  proof  is  based  on  Theorems  9.2.1  and  8.7.2A.  Denote  by  f!f  \x)  the 
density  of  the  distribution  of  Sn  .  Relation  (9.2.3)  implies  that,  for  x  =  cm,  a  E 
[a-,  a+l,  we  have 

fn(x)  =  e-X(a)xr(Ha))fy\x)  =  e-nMa)fy\x).  (9.3.16) 


Since  Ef1"1  =  a,  we  see  that  E( ,S'/“ '  —  x)  =  0  and  the  density  value  fna\x) 
coincides  with  the  density  of  the  distribution  of  the  sum  Sn  —an  at  the  point  0.  In 
order  to  use  Theorems  8.7.1  A  and  8.7.2A,  we  have  to  verify  conditions  (a)  and  (b) 
for  62  =  00  in  these  theorems  and  also  the  uniform  boundedness  in  a  E  [0,  a\]  of 


(9.3.17) 
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for  some  integer  m  >  1,  where  (P(x(a))  is  the  ch.f.  of  (the  uniform  version  of 
condition  (c)  in  Theorem  8.7.2).  By  condition  [D]  the  density 


f(a\v)  = 


e^(oi)v  /  (i,) 

VKMa)) 


in  bounded  uniformly  in  a  e  [0,  aq]  (for  such  a  one  has  X(a)  e  [0,  X\],  X\  = 
A(o'i)  <  A+).  Hence  the  integral 


fia\v))2dv 


is  also  uniformly  bounded,  and  so,  by  virtue  of  Parseval’s  identity  (see  Sect.  7.2),  is 
the  integral 

J \(p(X(a))(t)\2dt. 

This  means  that  the  required  uniform  boundedness  of  integral  (9.3.17)  is  proved 
for  m  —  2. 

Conditions  (a)  and  (b)  for  62  <  00  were  verified  in  the  proof  of  Theorem  9.3.1.  It 
remains  to  extend  the  verification  of  condition  (b)  to  the  case  62  =  00.  This  can  be 
done  by  following  an  argument  very  similar  to  the  one  used  in  the  proof  of  Theo¬ 
rem  9.3.1  in  the  case  of  finite  O2.  Let  62  =  00.  If  we  assume  that  there  exist  sequences 
Xk  €  [0,  and  \tk\  >0\  such  that 

W&k  +  itk)\  ^  1 


then,  by  compactness  of  [0,  A.+>e],  there  will  exist  sequences  X'k  — >  Ao  e  [0,  A_|_jg] 
and  t’k  such  that 


+  ltk)\ 

tA(A0) 


(9.3.18) 


But  by  virtue  of  condition  [D]  the  family  of  functions  \/r(X  +  it),  t  e  M,  is  equicon- 
tinuous  in  X  e  [0,  A+>e].  Therefore,  along  with  (9.3.18),  we  also  have  convergence 


l^(^o  +  it'k)\ 

f(Xo) 


\tk\  >0 1  >  0, 


which  contradicts  the  inequality 


\i/s(Xo  +  it)\ 
sup  - 

|/|>6>i  VK^o) 


<  1 


that  follows  from  the  existence  of  density. 
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Thus  property  (b)  is  proved  for  62  =  00,  and  we  can  use  Theorem  8. 7. 2 A,  which 
implies  that 


tia)(x)  = - 1  , _ (1+0 

cr(X(a))V2rtn 

This ,  together  with  (9.3.16),  proves  Theorem  9.3.3.  □ 

Remark  9.3.1  We  can  see  from  the  proof  that,  in  Theorem  9.3.3,  as  a  more  gen¬ 
eral  condition  instead  of  condition  [D]  one  could  also  consider  the  integrability  of 
+  it)  for  any  fixed  A.  e  [0,  Ai],  Ai  <  A+,  or  condition  [D]  imposed  on  Sm  for 
some  m  >  1 . 

For  arithmetic  distributions  we  cannot  assume  without  loss  of  generality  that 
m  1  =  E§  =  0,  but  that  does  not  change  much  in  the  formulations  of  the  assertions. 
If  A+  >  0,  then  =  t/F(A+)/t/f  (A+)  >  m\  and  the  scaled  deviations  a  =  x/n  for 
the  Cramer  range  must  lie  in  the  region  [m  1 ,  o?+). 

Theorem  9.3.4  Let  A+  >  0,E§2  <  00  and  the  distribution  of  ^  be  arithmetic.  Then , 
for  integer  x. 


g—nA(  a) 

P  (Sn=x)  =  - -=(1+0(1)), 

(Tqj  v  2  Tt  n 

where  the  remainder  term  o(  1)  is  uniform  in  a  =  x/n  E  [mi,  oi\\for  any  fixed  a\  E 
(m  i,a+). 

A  similar  assertion  is  valid  in  the  case  when  A_  <  0  and  a  E  (a_,  mi]. 

Proof  The  proof  does  not  differ  much  from  that  of  Theorem  9.3.1.  By  (9.2.3), 

PCS',,  =x)  =  e~x{a)x  i/~n  (\(a))Y  (S(na)  =x)=  e~nAia)'P(S(na)  =x), 

where  E^“-*  =  a  for  a  e  [m\,  a+).  In  order  to  compute  =  x)  we  have  to 

use  Theorem  8.7.3A.  The  verification  of  conditions  (a)  and  (b)  of  Theorem  8.7. 1A, 
which  are  assumed  to  hold  in  Theorem  8. 7. 3 A,  is  done  in  the  same  way  as  in  the 
proof  of  Theorem  9.3.1,  the  only  difference  being  that  relation  (9.3.4)  for  to  e  [0\ ,  7r] 
will  contradict  the  arithmeticity  of  the  distribution  of  §.  Since  a(X(a))  =  =  a , 

by  Theorem  8. 7. 3 A  we  have 


r$0)  =  *)  =  — ^O  +  od)) 

cf y  Ztt  n 


uniformly  in  a  =  x/n  e  [m\,  a\].  The  theorem  is  proved. 


□ 
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9.4  Integro-Local  Theorems  at  the  Boundary  of  the  Cramer 
Range 

9.4.1  Introduction 

In  this  section  we  again  assume  that  Cramer’s  condition  A+  >  0  is  met.  If  a+  =  oo 
then  the  theorems  of  Sect.  9.3  describe  the  large  deviation  probabilities  for  any 
a  =  x/n.  But  if  a+  <  oo  then  the  approaches  of  Sect.  9.3  do  not  enable  one  to 
find  the  asymptotics  of  probabilities  of  large  deviations  of  Sn  for  scaled  deviations 
a  =  x/n  in  the  vicinity  of  the  point  a+. 

In  this  section  we  consider  the  case  a+  <  oo.  If  in  this  case  A+  =  oo,  then,  by 
property  (A2)(i),  we  have  a+  =  s+  =  sup{f  :  F+{t)  >  0},  and  therefore  the  ran¬ 
dom  variables  are  bounded  from  above  by  the  value  a+,  P (Sn  >  x)  =  0  for 
a  =  x/n  >  a+.  We  will  not  consider  this  case  in  what  follows.  Thus  we  will  study 
the  case  <  oo,  <  oo. 

In  the  present  and  the  next  sections,  we  will  confine  ourselves  to  considering 
integro-local  theorems  in  the  non-lattice  case  with  A  =  An .  — >  0  since,  as  we  saw  in 
the  previous  section,  local  theorems  differ  from  the  integro-local  theorems  only  in 
that  they  are  simpler.  As  in  Sect.  9.3,  the  integral  theorems  can  be  easily  obtained 
from  the  integro-local  theorems. 


9.4.2  The  Probabilities  of  Large  Deviations  of  Sn  in  an 

o(n)-Vicinity  of  the  Point  a+n;  the  Case  xf/"(X+)  <  oo 

In  this  subsection  we  will  study  the  asymptotics  of  P (Sn  e  A[x)),  x  =  an,  when  a 
lies  in  the  vicinity  of  the  point  <*+  <  oo  and,  moreover,  <  oo.  (The  case  of 

distributions  F,  for  which  <  oo,  «+  <  oo  and  t/^//(A+)  <  oo,  will  be  illustrated 
later,  in  Lemma  9.4.1.)  Under  the  above-mentioned  conditions,  the  Cramer  trans¬ 
form  F(^+)  is  well  defined  at  the  point  A+,  and  the  random  variable  with  the 
distribution  F(^+)  has  mean  a+  and  a  finite  variance: 


(9.4.1) 


(cf.  (9.3.1)). 


Theorem  9.4.1  Let  §  be  a  non-lattice  random  variable , 

A+  g  (0,  oo),  <  oo,  y  =  x  —  a+n  =  o(n). 


If  A n  — >  0  slowly  enough  as  n  — >  oo  then 
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where 


X  2  f”(X+) 

a  =  — ,  or  =  - 

«  “+  VKA+) 

and  the  remainder  term  o(  1)  is  uniform  in  y. 


Proof  As  in  the  proof  of  Theorem  9.3.1,  we  use  the  Cramer  transform,  but  now  at 
the  fixed  point  A+,  so  there  will  be  no  triangular  array  scheme  when  analysing  the 
sums  S^+\  In  this  case  the  following  analogue  of  Theorem  9.2.1  holds  true. 

Theorem  9.2.1A  Let  e  (0,  oo),  <  oo  and  y  =  x  —  na+.  Then,  for  x  =  na 
and  any  fixed  A  >  0,  the  following  representation  is  valid : 

p  A 

P(S„  e  A[x))  =  e-nMa+^+y  /  e-A.+zp(5(«+)  _ane  dzy  ( 9.4.2 ) 

Jo 

Proof  of  Theorem  9. 2.1  A  repeats  that  of  Theorem  9.2.1  the  only  difference  being 
that,  as  was  already  noted,  the  Cramer  transform  is  now  applied  at  the  fixed  point 
which  does  not  depend  on  a  =  x/n.  In  this  case,  by  (9.2.3), 

P(S„  e  dv)  =  e~X+v+n  to*(*+)p(s(“+>  e  dv)  =  e-nA(a+)+X+(a+n-v) p^(«+)  £  dyy 

Integrating  this  equality  in  v  from  x  to  x  +  A,  changing  the  variable  v  =  x  +  z 
(. x  =  na ),  and  noting  that  a+n  —  v  =  —  y  —  z,  we  obtain  (9.4.2). 

The  theorem  is  proved.  □ 


Let  us  return  to  the  proof  of  Theorem  9.4.1.  Assuming  that  A  =  An  — >  0,  we 
obtain,  by  Theorem  9.2.1  A,  that 


P(Sn  e  An[x))  =e-nA(a+)-x+yp(s!?+)  -a+n  e  An[yj)(\  +  o(l)).  (9.4.3) 

By  virtue  of  (9.4.1),  we  can  apply  Theorem  8.7.1  to  evaluate  the  probability  on 
the  right-hand  side  of  (9.4.3).  This  theorem  implies  that,  as  An  — >  0  slowly  enough, 


P(^“+)  -  a+n  €  An[y))  = 


A 


n 


0 


y 


\  '\J~n 


+  o 


1 


A 


n 


® a _(_  V  2  Tt  n 


a+ 


exp 


rr  -  n 
ua+n 


+  o 


1 


uniformly  in  y.  This,  together  with  (9.4.3),  proves  Theorem  9.4.1 


□ 
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9.4.3  The  Class  of  Distributions  £1Z.  The  Probability  of  Large 
Deviations  of  Sn  in  an  o(n)-Vicinity  of  the  Point  a+n  for 
Distributions  F  from  the  Class  £7Z  in  Case  xf/"(X+)=oo 

When  studying  the  asymptotics  of  P(Sn  >  an)  (or  P (Sn  e  A[an)))  in  the  case  where 
^’//(A+)  =  oo  and  a  is  in  the  vicinity  of  the  point  a+  <  oo,  we  have  to  impose 
additional  conditions  on  the  distribution  F  similarly  to  what  was  done  in  Sect.  8.8 
when  studying  convergence  to  stable  laws. 

To  formulate  these  additional  conditions  it  will  be  convenient  to  introduce  certain 
classes  of  distributions.  If  A+  <  oo,  then  it  is  natural  to  represent  the  right  tails  F+(t) 
as 

F+(t)=e~x+,V(t),  (9.4.4) 

where,  by  the  exponential  Chebyshev  inequality,  V(t)  =  e°{l  1  as  t  — >  oo. 

Definition  9.4.1  We  will  say  that  the  distribution  F  of  a  random  variable  §  (or  the 
random  variable  §  itself)  belongs  to  the  class  7 Z  if  its  right  tail  F+{t)  is  a  regularly 
varying  function,  i.e.  can  be  represented  as 

F+{t)  =  t~pL(t),  (9.4.5) 

where  L  is  a  slowly  varying  function  as  t  — >  oo  (see  also  Sect.  8.8  and  Appendix  6). 

We  will  say  that  the  distribution  F  (or  the  random  variable  §)  belongs  to  the 
class  £ 7Z  if,  in  the  representation  (9.4.4),  the  function  V  is  regularly  varying  (which 
will  also  be  denoted  as  V  e  1Z). 

Distributions  from  the  class  7Z  have  already  appeared  in  Sect.  8.8. 

The  following  assertion  explains  which  distributions  from  £1 Z  correspond  to  the 
cases  a+  =  oo,  a+  <  oo,  ^"(A^)  =  oo  and  ^"(A^)  <  oo. 

Lemma  9.4.1  Let  F  e  £7 Z.  For  a+  to  be  finite  it  is  necessary  and  sufficient  that 

oo 

tV(t)dt  <  oo. 

For  to  be  finite,  it  is  necessary  and  sufficient  that 

oo 

t2V(t)  dt  <  oo. 

The  assertion  of  the  lemma  means  that  a+  <  oo  if  f>  >  2  in  the  representation 
V (t)  =  t~P L(t),  where  L  is  an  s.v.f.  and  a+  =  oo  if  f  <  2.  For  ft  =  2,  the  finiteness 
of  a+  is  equivalent  to  the  finiteness  of  f^°  t~lL{t)dt.  The  same  is  true  for  the 
finiteness  of  fi"(X+). 
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Proof  of  Lemma  9.4.1  We  first  prove  the  assertion  concerning  a+.  Since 


Vf/(A+) 
Vf(^+)  ’ 


we  have  to  estimate  the  values  of  i//(/,+)  and  i/y  (/.+  ).  The  finiteness  of  i// ' ( a  +  >  is 
equivalent  to  that  of 


/oo  poo 

tek+ldF+(t)  =  J  t(k+V(t)dt  -dV(t)), 


(9.4.6) 


where,  for  V (t)  =  o(  1  /t). 


l 


oo 


tdV(t)  =  V(  1)  + 


/ 


oo 


V(t)dt. 


Hence  the  finiteness  of  the  integral  on  the  left-hand  side  of  (9.4.6)  is  equivalent  to 
that  of  the  sum 

/oo  poo 

tV(t)dt  +  J  V(t)dt 

or,  which  is  the  same,  to  the  finiteness  of  the  integral  t  V  (t)  dt.  Similarly  we  see 

that  the  finiteness  of  ^r(A+)  is  equivalent  to  that  of  V(t)dt.  This  implies  the 
assertion  of  the  lemma  in  the  case  V(t)dt  <  oo,  where  one  has  V(t)  =  o(l/t). 
If  fx  V(t)dt  =  oo,  then  ^(A+)  =  oo,  ln^(A.)  — >►  oo  as  X  t  A+  and  hence  a+  = 
lim^+  (In  i/r  (A ,))'  =  oo. 

The  assertion  concerning  t/r"(A+)  can  be  proved  in  exactly  the  same  way.  The 
lemma  is  proved.  □ 


The  lemma  implies  the  following: 

(a)  If  0  <  2  or  /3  =  2  and  t  lL(t)  =  o o,  then  =  oo  and  the  theorems  of  the 

previous  section  are  applicable  to  P (Sn  >  x). 

(b)  If  ft  >  3  or  f3  =  3  and  t~l L(t)  dt  <  oo,  then  a+  <  oo,  <  oo  and 

we  can  apply  Theorem  9.4.1. 

It  remains  to  consider  the  case 

(c)  /3  e  [2,  3],  where  the  integral  t~lL(t)dt  is  finite  for  f  =  2  and  is  infinite  for 
13  =  3. 


It  is  obvious  that  in  case  (c)  we  have  <*+  <  oo  and  t/^//(A+)  =  oo. 

Put 


V+(t)  := 


k+tV(t) 


b(n)  := 


(-D 


1 

n 


where  V+  is  the  value  of  the  function  inverse  to  V+  at  the  point  1  /n. 
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Theorem  9.4.2  Let  §  be  a  non-lattice  random  variable ,  Fg^  and  condition  (c) 
hold.  If  An  ^  0  slowly  enough  as  n  oo,  then,  for  y  =  x—a+n  =  o(n ), 


P(S„e  a„[x)) 


b(n) 


+  0(1) 


where  f^  is  the  density  of  the  stable  law  F^-i  j)  with  parameters  /3  —  1,  1, 
a/id  remainder  term  o(  1)  A  uniform  in  y. 


We  will  see  from  the  proof  of  the  theorem  that  studying  the  probabilities  of  large 
deviations  in  the  case  where  a+  <  oo  and  =  oo  is  basically  impossible 

outside  the  class  £1Z,  since  it  is  impossible  to  find  theorems  on  the  limiting  distribu¬ 
tion  of  Sn  in  the  case  Var(§)  =  oo  without  the  conditions  [Ryj/0]  of  Sect.  8.8  being 
satisfied. 


Proof  of  Theorem  9.4.2  Condition  (c)  implies  that  a+  =  E?(«+)  <  oo  and 
Var(£(“+))  =  oo.  We  will  use  Theorem  9.2.1  A.  For  An  — >►  0  slowly  enough  we  will 
obtain,  as  in  the  proof  of  Theorem  9.4.1,  that  relation  (9.4.3)  holds  true.  But  now, 
in  contrast  to  Theorem  9.4.1,  in  order  to  calculate  the  probability  on  the  right-hand 
side  of  (9.4.3),  we  have  to  employ  the  integro-local  Theorem  8.8.3  on  convergence 
to  a  stable  law.  In  our  case,  by  the  properties  of  r.v.f.s,  one  has 


j  p  OO  j  p  oo 

P(£(“+)  >  t)  =  —  /  e^udF+(u)  =  — —  /  (k+V (u)du  -  dV (u)) 

i/  (x+)  j,  f  (x+)  jt 


Pt(X+) 


t~P+lL+(t)  ~  V+(t), 


(9.4.7) 


where  L+(f)  ~  L(t)  is  a  slowly  varying  function.  Moreover,  the  left  tail  of  the  distri¬ 
bution  F(o,+)  decays  at  least  exponentially  fast.  By  virtue  of  the  results  of  Sect.  8.8, 

this  means  that,  for  b(n)  =  V^Tl\l/n),  we  have  convergence  of  the  distributions 

-(«+)_ 

of  —  bp^  '  n  to  the  stable  law  F^_pi  with  parameters  /3  —  1  e  [1,  2]  and  1.  It  re¬ 
mains  to  use  representation  (9.4.3)  and  Theorem  8.8.3  which  implies  that,  provided 
An  — >  0  slowly  enough,  one  has 


P(S«"+)  -a+n  e  An[y)) 


An  y  \ 

b(n)  \b{n) ) 


uniformly  in  y.  The  theorem  is  proved. 


□ 


Theorem  9.4.2  concludes  the  study  of  probabilities  of  large  deviations  of  Sn/n 
in  the  vicinity  of  the  point  for  distributions  from  the  class  £IZ. 
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9.4.4  On  the  Large  Deviation  Probabilities  in  the  Range  a  >  a+ 
for  Distributions  from  the  Class  £1 Z 

Now  assume  that  the  deviations  v  of  Sn  are  such  that  a  =  x/n  >  0?+,  and  y  =  v  — 
a+n  grows  fast  enough  (faster  than  +Jn  under  the  conditions  of  Theorem  9.4.1  and 
faster  than  b{n)  under  the  conditions  of  Theorem  9.4.2).  Then,  for  the  probability 

P^-a+neA.W),  (9.4.8) 

the  deviations  y  (see  representation  (9.4.3))  will  belong  to  the  zone  of  large  devi¬ 
ations,  so  applying  Theorems  8.7.1  and  8.8.3  to  evaluate  such  probabilities  does 
not  make  much  sense.  Relation  (9.4.7)  implies  that,  in  the  case  F  e  £71,  we  have 
F(q'+)  ^  7^  Therefore,  we  will  know  the  asymptotics  of  the  probability  (9.4.8)  (and 
hence  also  of  the  probability  P (Sn  e  An[x)),  see  (9.4.3))  if  we  obtain  integro-local 
theorems  for  the  probabilities  of  large  deviations  of  the  sums  Sn ,  in  the  case  where 
the  summands  belong  to  the  class  7 Z.  Such  theorems  are  also  of  independent  inter¬ 
est  in  the  present  chapter,  and  the  next  section  will  be  devoted  to  them.  After  that, 
in  Sect.  9.6  we  will  return  to  the  problem  on  large  deviation  probabilities  in  the 
class  £7Z  mentioned  in  the  title  of  this  section. 


9.5  Integral  and  Integro-Local  Theorems  on  Large  Deviation 
Probabilities  for  Sums  Sn  when  the  Cramer  Condition  Is  not 
Met 

If  E§  =0  and  the  right-side  Cramer  condition  is  not  met  (A+  =  0),  then  the  rate 
function  A  (a)  degenerates  on  the  right  semiaxis:  A  (a)  =  X  (a)  =0  for  of  >  0,  and 
the  results  of  Sects.  9. 1-9.4  on  the  probabilities  of  large  deviations  of  Sn  are  of  little 
substance.  In  this  case,  in  order  to  find  the  asymptotics  of  P (Sn  >  x)  and  P (Sn  e 
A[x)),  we  need  completely  different  approaches,  while  finding  these  asymptotics  is 
only  possible  under  additional  conditions  on  the  behaviour  of  the  tail  F+(t)  of  the 
distribution  F,  similarly  to  what  happened  in  Sect.  8.8  when  studying  convergence 
to  stable  laws. 

The  above-mentioned  additional  conditions  consist  of  the  assumption  that  the  tail 
F+(t)  behaves  regularly  enough.  In  this  section  we  will  assume  that  F+(t)  =  V  (t)  e 
7 Z,  where  7Z  is  the  class  of  regularly  varying  functions  introduced  in  the  previous 
section  (see  also  Appendix  6).  To  make  the  exposition  more  homogeneous,  we  will 
confine  ourselves  to  the  case  /3  >  2,  Var(§)  <  oo,  where  —/3  is  the  power  exponent 
in  the  function  V  elZ  (see  (9.4.5)).  Studying  the  case  /3  e  [1, 2]  (Var(§)  =  oo)  does 
not  differ  much  from  the  exposition  below,  but  it  would  significantly  increase  the 
volume  of  the  exposition  and  complicate  the  text,  and  therefore  is  omitted.  Results 
for  the  case  /3  e  (0,  2]  can  be  found  in  [8,  Chap.  3]. 
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9.5.1  Integral  Theorems 

Integral  theorems  for  probabilities  of  large  deviations  of  Sn  and  maxima  Sn  = 
ma Xk<n  Sk  in  the  case  E§  =  0,  Var(§)  <  oo,  F  e  1Z,  /3  >  2,  follow  immediately  from 
the  bounds  obtained  in  Appendix  8.  In  particular,  Corollaries  A8.2.1  and  A8.3.1  of 
Appendix  8  imply  the  following  result. 

Theorem  9.5.1  Let  E§  =  0,  Var(§)  <  oo,  F  e  1Z  and  ft  >  2.  Then,  for  x  \fn  In  n, 

P (sn  >x)-  P (Sn  >  x)  -  nV(x).  (9.5.1) 

Under  an  additional  condition  [Do]  to  be  introduced  below,  the  assertion  of  this 
theorem  will  also  follow  from  the  integro-local  Theorem  9.5.2  (see  below). 

Comparing  Theorem  9.5.1  with  the  results  of  Sects.  9. 2-9.4  shows  that  the  nature 
of  the  large  deviation  probabilities  is  completely  different  here.  Under  the  Cramer 
condition  and  for  a  =  x/n  e  (0,  a+),  the  large  deviations  of  Sn  are,  roughly  speak¬ 
ing,  “equally  contributed  to  by  all  the  summands”  §&,  k  <n.  This  is  confirmed  by 
the  fact  that,  for  a  fixed  a ,  the  limiting  conditional  distribution  of  §&,  k  <  n,  given 
that  Sn  e  A[x)  (or  $n  —  -*0  for  x  =  an,  A  =  1,  as  n  — >  oo  coincides  with  the  distri¬ 
bution  F(o^  of  the  random  variable  The  reader  can  verify  this  himself/herself 
using  Theorem  9.3.2.  In  other  words,  the  conditions  {Sn  e  A[x)}  (or  {Sn  >  x}), 
x  =  an,  change  equally  (from  F  to  F^)  the  distributions  of  all  the  summands. 

However,  if  the  Cramer  condition  is  not  met,  then  under  the  conditions  of  The¬ 
orem  9.5.1  the  large  deviations  of  Sn  are  essentially  due  to  one  large  (comparable 
with  x)  jump.  This  is  seen  from  the  fact  that  the  value  of  nV(x)  on  the  right-hand 
side  of  (9.5.1)  is  nothing  else  but  the  main  term  of  the  asymptotics  for  P(§w  >  x), 
where  § n  =  max^<n  Indeed,  if  nV (x)  — >  0  then 

P(F„  <  x)  =  (1  -  V(x))n  =  1  -nV(x)  +  0((nV(x))2), 

P (f  „  >  x)  =  nV (x)  +  0((nV(j))2)  ~  nV (x). 

In  other  words,  the  probabilities  of  large  deviations  of  Sn,  Sn  and  § n  are  asymp¬ 
totically  the  same.  The  fact  that  the  probabilities  of  the  events  {§;-  >  y}  for  y  ~  v 
play  the  determining  role  in  finding  the  asymptotics  of  P (Sn  >  x)  can  easily  be 
discovered  in  the  bounds  from  Appendix  8. 

Thus,  while  the  asymptotics  of  P (Sn  >  x)  for  x  =  an  »  ^Jn  in  the  Cramer  case 
is  determined  by  “the  whole  distribution  F”  (as  the  rate  function  A(a)  depends  on 
the  “the  whole  distribution  F”),  these  asymptotics  in  the  case  F  e  7Z  are  determined 
by  the  right  tail  F+(t)  =  V (t)  only  and  do  not  depend  on  the  “remaining  part”  of 
the  distribution  F  (for  the  fixed  value  of  E§  =0). 
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9.5.2  Integro-Local  Theorems 


In  this  section  we  will  study  the  asymptotics  of  P (Sn  6  A[x ))  in  the  case  where 

E§  =  0,  Var§2  <  oo,  F  e  7Z,  /3  >  2,  x^VnAmi.  (9.5.2) 

These  asymptotics  are  of  independent  interest  and  are  also  useful,  for  example,  in 
finding  the  asymptotics  of  integrals  of  type  E (g(Sn);  Sn  >  x)  for  v  ^/n\nn  for 
a  wide  class  of  functions  g.  As  was  already  noted  (see  Subsection  4.4),  in  the  next 
section  we  will  use  the  results  from  the  present  section  to  obtain  integro-local  theo¬ 
rems  under  the  Cramer  condition  (for  summands  from  the  class  S1Z)  for  deviations 
outside  the  Cramer  zone. 

In  order  to  obtain  integro-local  theorems  in  this  section,  we  will  need  additional 
conditions.  Besides  condition  F  e  TZ,  we  will  also  assume  that  the  following  holds: 

Condition  [Do]  For  each  fixed  A,  as  t  —>  oo, 

V(t)  -  V(t  +  A)  =  v(t)(A  +  0(1)),  v(t)  =  ^—2. 

It  is  clear  that  if  the  function  L(t)  in  representation  (9.4.5)  (or  the  function  V  (, t )) 
is  differentiable  for  t  large  enough  and  L'(t)  =  o(L(t)/t)  as  t  — >  oo  (all  sufficiently 
smooth  s.v.f.s  possess  this  property;  cf.  e.g.,  polynomials  of  In  t  etc.),  then  condi¬ 
tion  [Do]  will  be  satisfied,  and  the  derivative  —  V'(t)  ~  v(t)  will  play  the  role  of  the 
function  v(t). 


Theorem  9.5.2  Let  conditions  (9.5.2)  and  [Do]  be  met.  Then 

P (Sn  e  A[x))  =  Anv(x)(  1  +<9(1)),  v(x)  = 

where  the  remainder  term  o(  1)  is  uniform  in  x  >  N^/n  In n  and  A  e  [A  i,  Af\  for 
any  fixed  A  2  >  A\  >  0  and  any  fixed  sequence  N^o o. 

Note  that  in  Theorems  9.5.1  and  9.5.2  we  do  not  assume  that  n^o o.  The  as¬ 
sumption  that  x  — >  00  is  contained  in  (9.5.2). 


Proof  For  y  <  v ,  introduce  the  events 


n 


Gn  :=  {S„  e  A[x)},  Bj  :={%j  <y],  B  :=f^Bj 

7=1 


(9.5.3) 


Then 


n 


b  =  \Jbj , 


P(Gn)  =  P(GnB)+P(GnB), 


(9.5.4) 
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where 


7=1  7=1  i<j<n 


(9.5.5) 


(see  property  8  in  Sect.  9.2.2). 

The  proof  is  divided  into  three  stages:  the  bounding  of  P (GnB),  that  of 
P (GnBjB j),  i  j,  and  the  evaluation  of  P (GnB j). 

(1)  A  bound  on  P (GnB).  We  will  make  use  of  the  rough  inequality 


P(GnB)<P(Sn>x;B) 


(9.5.6) 


and  Theorem  A8.2.1  of  Appendix  8  which  implies  that,  for  x  =  ry  with  a  fixed 
r  >  2,  any  8  >  0,  and  v  >  Ns/n  km,  N  — >►  oo,  we  have 

P(S»  >*;*)<(« VOO)'-*.  (9.5.7) 

Here  we  can  always  choose  r  such  that 

(«y(r))r  5<C/iAn(v)  (9.5.8) 

for  v  a Jn.  Indeed,  putting  n  :=  x2  and  comparing  the  powers  of  v  on  the  right- 
hand  and  left-hand  sides  of  (9.5.8),  we  obtain  that  for  (9.5.8)  to  hold  it  suffices  to 
choose  r  such  that 


(2  —  /3)(r  —  8)  <  l  — 

which  is  equivalent,  for  /3  >  2,  to  the  inequality. 

0-1 

r  >  - . 

0-2 

For  such  r,  we  will  have  that,  by  (9.5.6)-(9.5.8), 

P  (GnB)  =  o(nAv(x)).  (9.5.9) 

Since  r  —  8  >  1,  we  see  that,  for  n  <<C  v2,  relations  (9.5.8)  and  (9.5.9)  will  hold  true 
all  the  more. 

(2)  A  bound  for  P  (GnBjB  j).  It  is  sufficient  to  bound  Y(GnBn-\Bn).  Set 

llr  i 

8  Hi  :=  \v  :  v  <  (1  —  k8)x  +  A\,  k=  1,2. 

r  2  1  J 


Then 


P(GnBn-\Bn)  =  f  P (Sn-2edz) 

Jh2 
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x 


I  P(Z  +  §  e  dv,  §  >  8x)P(v  +  §  e  A[x),  §  >  <ta) 

JHi 


(9.5.10) 


Since  in  the  domain  H\  we  have  x  —  v  >  8x  —  A,  the  last  factor  on  the  right-hand 
side  of  (9.5.10)  has,  by  condition  [Do],  the  form  Av(x  —  r»)(l  +  o(  1))  <  cAv(x)  as 
x  — >  oo,  so  the  integral  over  H\  in  (9.5.10),  for  x  large  enough,  does  not  exceed 

cAv(x)P(z  +  §  G  H\;  §  >  5x)  <  cAv(x)V (8x). 

The  integral  over  the  domain  Hz  in  (9.5.10)  evidently  allows  a  similar  bound.  Since 
^T(x)  ^  0,  we  obtain  that 

V(GnBiBj)  <  c\An2v(x)V (x)  =  o(Anv(x)).  (9.5.11) 

i<j<n 

(3)  The  evaluation  of  P (GnB j)  is  based  on  the  relation 

P (G„Bn)=  f  P(Sn-1edz)P^eA[x-z),^>Sx) 

J  Hi 

<[  P(S„-1€dz)P(^eA[x-z)) 

J  H\ 

=  a[  P(S„_i  e  dz)v(x  —  z)(l  +  o(l)),  (9.5.12) 

JHi 


which  yields 

P (GnBn)  <  AE[v(x  -  S„_i);  S„_i  <  (1  -  S)x  +  A](l  +  o(l)) 

=  Av(x)(l  +o(l)).  (9.5.13) 

The  last  relation  is  valid  for  x  ~Jn,  since,  by  Chebyshev’s  inequality,  E[  v(x  — 
V-i);  IV-il  <  M<s/n\  ~  v(x)  as  M  —>  oo,  M^fn  =  o(x)  and,  moreover,  the  fol¬ 
lowing  evident  bounds  hold: 

E[u(x  -  Sn- 1);  Sn- 1  6  (Msfn,  (1  -  S)x  +  zl)]  =  o(v(x)). 


E[n(x  -  Sn- 1);  Sn- 1  e  (-oo,  -AfV«)]  =  o(n(x)) 

as  M  — >  oo. 

Similarly,  by  (virtue  of  (9.5.12))  we  get 


P  (GnBn)> 


r 


(1  -S)x 


P(Sn-i  e  dz) P(£  e  A[x  -  z))  ~  Av(x). 


(9.5.14) 
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From  (9.5.13)  and  (9.5.14)  we  obtain  that 

V{GnBn)  =  Av{x)(\+o{\)). 

This,  together  with  (9.5.4),  (9.5.9)  and  (9.5.11),  yields  the  representation 

P  (fin)  =  Anv(x)(  1  +  o(l)). 

The  required  uniformity  of  the  term  o(l)  clearly  follows  from  the  preceding  argu¬ 
ment.  The  theorem  is  proved.  □ 

Theorem  9.5.2  implies  the  following 

Corollary  9.5.1  Let  the  conditions  of  Theorem  9.5.2  be  satisfied.  Then  there  exists 
a  fixed  sequence  A n  converging  to  zero  slowly  enough  as  N  — >  oo  such  that  the 
assertion  of  Theorem  9.5.2  remains  true  when  the  segment  [A  i,  Af\  is  replaced  in 
it  with  [An,  A2]. 


9.6  Integro-Local  Theorems  on  the  Probabilities  of  Large 
Deviations  of  Sn  Outside  the  Cramer  Range  (Under  the 
Cramer  Condition) 

We  return  to  the  case  where  the  Cramer  condition  is  met.  In  Sects.  9.3  and  9.4 
we  obtained  integro-local  theorems  for  deviations  inside  and  on  the  boundary  of 
the  Cramer  range.  It  remains  to  study  the  asymptotics  of  P (Sn  e  A[x ))  outside 
the  Cramer  range,  i.e.  for  a  =  x /n  >  a+.  Preliminary  observations  concerning  this 
problem  were  made  in  Sect.  9.4.4  where  it  was  reduced  to  integro-local  theorems 
for  the  sums  Sn  when  Cramer’s  condition  is  not  satisfied.  Recall  that  in  that  case  we 
had  to  restrict  ourselves  to  considering  distributions  from  the  class  £1Z  defined  in 
Sect.  9.4.3  (see  (9.4.4)). 

Theorem  9.6.1  Let  F  e  £1Z,  /3  >  3,  a  =  x/n  >  a?+  and  y  =  x  —  a+n  +Jn.  Then 
there  exists  a  fixed  sequence  A  n  converging  to  zero  slowly  enough  as  N  —>  00,  such 
that 


P {Sn  G  AN[x))  =  e-nA(a+)-x+ynANv+(y)(\+o(\  j) 

=  e-nMa)nANv+(y)(\+o{\)), 

where  u+(y)  =  V(y)/\ls(X+),  the  remainder  term  o(  1)  is  uniform  in  x  and  n  such 

that  y  N\/n  In  n,  N  being  an  arbitrary  fixed  sequence  tending  to  00. 

Proof  By  Theorem  9.2.1  A  there  exists  a  sequence  An  converging  to  zero  slowly 
enough  such  that  (cf.  (9.4.3)) 

P {Sn  e  AN[x))=e-nA{a+)-x+yp(sZ,+) -a+n  e  ziyvU))- 


(9.6.1) 


9.6  Large  Deviations  of  Sn  Outside  the  Cramer  Range 


275 


Since  by  properties  (A  1)  and  (A2)  the  function  A(a)  is  linear  for  a  >  a+: 

A(a)  =  A(a+)  +  (a  —  g?_|_)A+, 
the  exponent  in  (9.6.1)  can  be  rewritten  as 


— nA{a+)  —  \+y  —  —nA(a). 

The  right  tail  of  the  distribution  of  £(o'+)  has  the  form  (see  (9.4.7)) 


P(£(“+)  >t)  = 


X 


+ 


f{X+) 


/ 


oo 


V(u)du  +  V(t). 


By  the  properties  of  regularly  varying  functions  (see  Appendix  6), 


V (t)  -  V (t  -  u)  =  o({V (t)) 


as  t  oo  for  any  fixed  u.  This  implies  that  condition  [Do]  of  Sect.  9.5  is  satisfied 
for  the  distribution  of  ^a+\ 

This  means  that,  in  order  to  calculate  the  probability  on  the  right-hand  side 
of  (9.6.1),  we  can  use  Theorem  9.5.2  and  Corollary  9.5.1,  by  virtue  of  which,  as 
A  n  — >  0  slowly  enough, 

P(^“+)  -a+n  e  AN[y))  =  nANv+(y)(  1  +o(l)), 

where  the  remainder  term  o(  1)  is  uniform  in  all  x  and  n  such  that  y  N y/n  In n, 

N  — >►  oo. 

The  theorem  is  proved.  □ 


Since  P (Sn  e  An[x))  decreases  exponentially  fast  as  x  (or  y)  grows  (note  the 
factor  e~k+y  in  (9.6.1)),  Theorem  9.6.1  immediately  implies  the  following  integral 
theorem. 


Corollary  9.6.1  Under  the  conditions  of  Theorem  9.6.1, 


P (Sn  >x)  =  e 


_  -nA(a) 


nV(y) 


(l+o(l)) 


Proof  Represent  the  probability  P {Sn  >  x)  as  the  sum 


oo 


P(^  >x)  =  ^P(^  G  An[x  +kAN )) 


k= 0 


,—nA(a )  + 


oo 


Vf(^+) 


J2^NV(y  +  ANk)e-x+ANk. 


k= 0 
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Here  the  series  on  the  right-hand  side  is  asymptotically  equivalent,  as  N  — >  oo,  to 
the  integral 

V(y)  [°°e-^dt=™ 

Jo  A+ 

The  corollary  is  proved.  □ 

Note  that  a  similar  corollary  (i.e.  the  integral  theorem)  can  be  obtained  under  the 
conditions  of  Theorem  9.4.2  as  well. 

In  the  range  of  deviations  a  =  |  >  0?+,  only  the  case  F  e  S7Z,  e  [2,  3]  (recall 
that  a+  =  oo  for  /3  <  2)  has  not  been  considered  in  this  text.  As  we  have  already 
said,  it  could  also  be  considered,  but  that  would  significantly  increase  the  length  and 
complexity  of  the  exposition.  Results  dealing  with  this  case  can  be  found  in  [8];  one 
can  also  find  there  a  more  complete  study  of  large  deviation  probabilities. 


Chapter  10 

Renewal  Processes 


Abstract  This  is  the  first  chapter  in  the  book  to  deal  with  random  processes  in  con¬ 
tinuous  time,  namely,  with  the  so-called  renewal  processes.  Section  10.1  establishes 
the  basic  terminology  and  proves  the  integral  renewal  theorem  in  the  case  of  non- 
identically  distributed  random  variables.  The  classical  Key  Renewal  Theorem  in  the 
arithmetic  case  is  proved  in  Sect.  10.2,  including  its  extension  to  the  case  where 
random  variables  can  assume  negative  values.  The  limiting  behaviour  of  the  excess 
and  defect  of  a  random  walk  at  a  growing  level  is  established  in  Sect.  10.3.  Then 
these  results  are  extended  to  the  non-arithmetic  case  in  Sect.  10.4.  Section  10.5  is 
devoted  to  the  Law  of  Large  Numbers  and  the  Central  Limit  Theorem  for  renewal 
processes.  It  also  contains  the  proofs  of  these  laws  for  the  maxima  of  sums  of  in¬ 
dependent  non-identically  distributed  random  variables  that  can  take  values  of  both 
signs,  and  a  local  limit  theorem  for  the  first  hitting  time  of  a  growing  level.  The  chap¬ 
ter  ends  with  Sect.  10.6  introducing  generalised  (compound)  renewal  processes  and 
establishing  for  them  the  Central  Limit  Theorem,  in  both  integral  and  integro-local 
forms. 


10.1  Renewal  Processes.  Renewal  Functions 

10.1.1  Introduction 

The  sequence  of  sums  of  random  variables  {Sn},  considered  in  previous  chapters,  is 
often  called  a  random  walk.  It  can  be  considered  as  the  simplest  random  process  in 
discrete  time  n.  The  further  study  of  such  processes  is  contained  in  Chaps.  11,  12 
and  20. 

In  this  chapter  we  consider  the  simplest  processes  in  continuous  time  t  that  are 
also  entirely  determined  by  a  sequence  of  independent  random  variables  and  do 
not  require,  for  their  construction,  any  special  structures  (in  the  general  case  such 
constructions  will  be  needed;  see  Chap.  18). 

Let  Ti,  {zj}JL2  a  sequence  of  independent  random  variables  given  on  a  prob¬ 
ability  space  (£?,#,  P)  (here  we  change  our  conventional  notations  § j  to  xj  for  rea¬ 
sons  that  will  become  clear  in  Sect.  10.6,  where  appear  again).  For  the  random 
variables  T2,  T3, ...  we  will  usually  assume  some  homogeneity  property:  proximity 
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of  the  expectations  or  identical  distributions.  The  random  variable  x\  can  be  arbi¬ 
trary. 

Definition  10.1.1  A  renewal  process  is  a  collection  of  random  variables  rj(t)  de¬ 
pending  on  a  parameter  t  and  defined  on  (£2,  P)  by  the  equality 

r](t)  :=min{k  >0  :  Tk  >  t},  t>  0,  (10.1.1) 


where 


h 

Tk  :=  y~]r j.  To  :=  0. 

j= i 

The  variables  rj(t)  are  not  completely  defined  yet.  We  do  not  know  what  r](t)  is 
for  co  such  that  the  level  t  is  never  reached  by  the  sequence  of  sums  7&.  In  that  case 
it  is  natural  to  put 


r](t):=o o  if  all  Tk  <t.  (10.1.2) 

Clearly,  r](t)  is  a  stopping  time  (see  Sect.  4.4). 

Usually  the  random  variables  X2,  T3, . . .  are  assumed  to  be  identically  distributed 
with  a  finite  expectation.  The  distribution  of  the  random  variable  x\  can  be  arbitrary. 

We  assume  first  that  all  the  random  variables  Xj  are  positive.  Then  definition 
(10.1.1)  allows  us  to  consider  ij(t)  as  a  random  function  that  can  be  described 
as  follows.  If  we  plot  the  points  To  =  0,  T\ ,  72, . . .  on  the  real  line,  then  one  has 
rj(t)  =  0  on  the  semi-axis  (— oo,  0),  rj(t)  =  1  on  the  semi-interval  [0,  T\),  rj{t)  =  2 
on  the  semi-interval  [T\ ,  T2)  and  so  on. 

The  sequence  {7&}j^L0  is  also  often  called  a  renewal  process.  Sometimes  we  will 
call  the  sequence  {7^}  a  random  walk.  The  quantity  rj(t)  can  also  be  called  the  first 
passage  time  of  the  level  t  by  the  random  walk  {7&}^ i0. 

If,  based  on  the  sequence  {7^},  we  construct  a  random  walk  T(x)  in  continuous 
time: 


T ( x )  :=  T/c  for  v  e  [k,  k  +  1),  k  >  0, 
then  the  renewal  process  r/(t )  will  be  the  generalised  inverse  ofT(x ): 

rj(t)  =  inf{x  >  0  :  T (x)  >  t] . 

The  term  “renewal  process”  is  related  to  the  fact  that  the  function  ij(t)  and  the 
sequence  {7/J  are  often  used  to  describe  the  operation  of  various  physical  devices 
comprising  replaceable  components.  If,  say,  Xj  is  the  failure-free  operating  time 
of  such  a  component,  after  which  the  latter  requires  either  replacement  or  repair 
(“renewal”,  which  is  supposed  to  happen  immediately),  then  Tk  will  denote  the  time 
of  the  k- th  “renewal”  of  the  component,  while  ij(t)  will  be  equal  to  the  number  of 
“renewals”  which  have  occurred  by  the  time  t. 
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Remark  10.1.1  If  the  j-th  renewal  of  the  component  does  not  happen  immediately 
but  requires  a  time  rj  >0,  then,  introducing  the  random  variables 

k 

z* Tj  +  z'j,  :=  y>;,  T]*(t)  :=  min{ k  :  T£  >  t }, 

j= i 

we  get  an  object  of  the  same  nature  as  before,  with  nearly  the  same  physical  mean¬ 
ing.  For  such  an  object,  a  number  of  additional  results  can  be  obtained,  see  e.g., 
Remark  10.3.1. 

Renewal  processes  are  also  quite  often  used  in  probabilistic  research  per  se,  and 
also  when  studying  other  processes  for  which  there  exist  so-called  “regeneration 
times”  after  which  the  evolution  of  the  process  starts  anew.  Below  we  will  encounter 
examples  of  such  use  of  renewal  processes. 

Now  we  return  to  the  general  case  where  xj  may  assume  both  positive  and  nega¬ 
tive  values. 

Definition  10.1.2  The  function 

H(t)  :=  E/7(7),  t  >  0, 
is  called  the  renewal  function  for  the  sequence 

In  the  existing  literature,  another  definition  is  used  more  frequently. 

Definition  10. 1.2  A  The  renewal  function  for  the  sequence  {T^}^0  is  defined  by 

oo 

U(t)  :=y> (Tj<t). 
j=  o 

The  values  of  H(u)  and  T(u)  can  be  infinite. 

if  Xj  >  0  then  the  above  definitions  are  equivalent.  Indeed,  for  t  >  0,  consider 
the  random  variable 

vft)  :=  max{k  :  <  t}  =  r/(t )  —  1. 

Then  clearly 

oo 

£l(7)<f)  =  l  +  v(f), 
j= o 

where  1(A)  is  the  indicator  of  the  event  A,  and 

U(t)  =  1  +  E  v(t)  =  E  q(t)  =  H(t). 

The  value  U(t)  =  E  v(t)  +  1  is  the  mean  time  spent  by  the  trajectory  {Tj}Jf  0  in  the 
interval  [0,  t]. 

If  Xj  can  take  values  of  different  signs  then  clearly  v(t)  >  q(t)  and,  with  a  pos¬ 
itive  probability,  v(t)  >  ij(t)  (the  trajectory  {Tj},  after  crossing  the  level  t,  can  re¬ 
turn  to  the  region  (— oo,  t]).  Therefore  in  that  case  U(t )  >  H(t).  Thus  for  Xj  taking 
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values  of  different  signs  we  have  two  versions  of  the  renewal  function  given  in  Def¬ 
initions  10.1.2  and  10.1.2A.  We  will  call  them  the  first  and  the  second  versions, 
respectively.  In  the  present  chapter  we  will  consider  the  first  version  only  (Defini¬ 
tion  10.1.2).  The  second  version  is  discussed  in  Appendix  9. 

Note  that,  for  xj  assuming  values  of  both  signs  and  t  <  0,  we  have  H(t )  =  0, 
U  ( t )  >  0,  so  the  function  H(t)  has  a  jump  of  magnitude  1  at  the  point  t  =  0. 

Note  also  that  the  functions  H(t)  and  U(t)  we  defined  above  are  right- 
continuous.  In  the  existing  literature,  one  often  considers  left-continuous  versions 
of  renewal  functions  defined  respectively  as 

oo 

H (t  —  0)  =  Emin{k  :  Sk  >  t)  and  U(t  —  0)  =  <  t). 

j= o 

If  all  xj  are  identically  distributed  and  F*k(t)  is  the  k-fold  convolution  of  the  dis¬ 
tribution  function  F(t)  =  P(fj  <  t),  then  the  second  left-continuous  version  of  the 
renewal  function  can  also  be  represented  in  the  form 

oo 

k=0 

where  F*°  corresponds  to  the  distribution  degenerate  at  zero. 

From  the  point  of  view  of  the  exposition  below,  it  makes  no  difference  which 
version  of  continuity  is  chosen.  For  several  reasons,  in  the  present  chapter  it  will  be 
more  convenient  for  us  to  deal  with  right-continuous  renewal  functions.  Everything 
below  will  equally  apply  to  left-continuous  renewal  functions  as  well. 


10.1.2  The  Integral  Renewal  Theorem  for  Non-identically 
Distributed  Summands 


In  the  case  where  Xj,  j  >  2,  are  not  necessarily  identically  distributed  and  do  not 
possess  other  homogeneity  properties,  singling  out  the  random  variable  x\  makes 
little  sense. 


Theorem  10.1.1  Let  xj,  j  >  1,  be  uniformly  integrable  from  the  right ,  E|7V|  <  oo 
for  any  fixed  N  and  a k  =  Er^  a  >  0  as  k  ^  oo.  Then  the  following  limit  exists 


lim 

t  — >  oo 


Hit)  1 


a 


(10.1.3) 


Proof  We  will  need  the  following  definition. 


Definition  10.1.3  The  random  variable 

x(t)  =  Tv(t)  -  t>  0 

is  said  to  be  the  excess  of  the  level  t  (or  overshoot  over  the  level  t)  for  the  random 
walk  {Tj}. 
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Lemma  10.1.1  If  £  [ a *,  a*],  a*  >  0,  then 

t  Eri(t)  1 

E  rj(t)> — ,  limsup - < — .  (10.1.4) 

t^oo  t  a* 

Proof  By  Theorem  4.4.2  (see  also  Example  4.4.3) 

ETv(,)=t  +  Ex(t)  <a*Er](t). 

This  implies  the  first  inequality  in  (10.1.4).  Now  introduce  truncated  random  vari- 
ables  x)  :=  min(r/,  s).  By  virtue  of  the  uniform  integrability,  one  can  choose  an  s 
such  that,  for  a  given  s  e  (0,  a*),  we  would  have 

a j  s  :=  Er|s^  >  a*  —  s. 

Then,  by  Theorem  4.4.2, 

t  +  s>ET^)(t)>(a,-s)Er1^, 

where 

n 

:=  :=  cnin[k  :  >  t]. 

7  =  1 

Since  rj(t)  <  one  has 

H(t)  =  Er](t)<Eri(s)(t)<-—-.  (10.1.5) 

ci*  —  e 

As  £  >  0  can  be  chosen  arbitrarily,  we  obtain  that 

,.  H{t)  ^  1 
lim  sup - <  — . 

>o o  t  Cl* 

The  lemma  is  proved.  □ 

We  return  to  the  proof  of  Theorem  10.1.1.  For  a  given  e  >  0,  find  an  N  such 
that  a,k  G  \a  —  s,  a  +  s]  for  all  k  >  N  and  denote  by  H^{t)  the  renewal  function 
corresponding  to  the  sequence  {7#+*;}^.  Then 

H  ft)  =  E  >  7)  +  f  P(7V  G  du)\N  +  H^{t  —  u)\ 

2—00 

=  E[HN(t  —  7)y);  Tn  <  t]  +  r^,  (10.1.6) 

where 

rN  :=  E TN  >  t)  +  NP(Tn  <t)<  NP(Tn  >  t)  +  NP(Tn  <t)  =  N. 
Relation  (10.1.5)  implies  that  there  exist  constants  c\,  C2,  such  that,  for  all  t , 

Hn( 0  <  C\  +  C2t. 


Therefore,  for  fixed  N  and  M, 
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Rn,m  •=  E [HN(t  -  Tn);  |  TV  I  >  M,  TN  <  t ] 

<  (a  +  C2t)P(\TN  \  >  M,Tn  <t)  +  C2~E\Tn\. 

Choose  an  M  such  that  cfP{\T^\  >  M)  <  s.  Then 

ryy  +  Rn,M 

limsup - <  e. 

t  — >  OO  t 


(10.1.7) 


To  bound  H(t)  in  (10.1.6)  it  remains  to  consider,  for  the  chosen  N  and  M,  the 
function 


Hn,m(0  E[#aK^  —  Tn)-,  \Tn\  <  M]. 


By  Lemma  10.1.1, 


r  HnMO  ^  i 

lim  sup - < - , 

r— >oo  t  ci  £ 

..  .  P(\Tn\<M )  1+e/d 

lim  ini - > - > - 

t  — >-  CXD 


t  Cl  - h  £ 

This  together  with  (10.1.6)  and  (10.1.7)  yields 


$  T-  ^ 


^  ,  1 

lim  sup - <s~\ - 

r— o  t  £ 


(1  -  S/C2) 

lim  ml - > - 


t^oo  t 


Cl  - h  £ 


Since  s  is  arbitrary,  the  foregoing  implies  (10.1.3). 
The  theorem  is  proved. 


□ 


Remark  10.1.2  One  can  obtain  the  following  generalisation  of  Theorem  10.1.1,  in 
which  no  restrictions  on  r\  >  0  are  imposed.  Let  x\  be  an  arbitrary  nonnegative 
random  variable,  and  r*  :=  x\  +j  satisfy  the  conditions  of  Theorem  10.1.1.  Then 
(10.1.3)  still  holds  true. 

This  assertion  follows  from  the  relations 

H(t)=V(x\  >0+  f  P(ri  e  dv)H*(t  -  v),  (10.1.8) 

Jo 

where  H *(t)  corresponds  to  the  sequence  {xj }  and,  for  each  fixed  N  and  v  <  N, 

H*{t  —  v)  _  H*(t  —  v)  t-v  1 
t  t  —  v  t  a 

as  t  — >  oo.  Therefore 

1  fN  *  P(ri  <  N) 

-  P(ri  edv)H*(t -v)^-  — 

t  Jo  a 

For  the  remaining  part  of  the  integral  in  (10.1.8),  we  have 

l  *  H*(t)  P(n  >  N) 

limsup  -  /  P(ri  e  dv)H  (t  —  v)  <  limsup - P(ri  >  N)  = - . 

t  — >  oo  t  J1 y  t  — >  oo  t  a 

Since  the  probability  P(ti  >  N)  can  be  made  arbitrarily  small  by  the  choice  of  N, 
the  assertion  is  proved.  □ 
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It  is  not  difficult  to  verify  that  the  condition  x\  >  0  can  be  relaxed  to  the  condition 
Emin(0,  Ti)  >  —  oo.  However,  if  Emin (0,  z\)  =  —  oo,  then  H(t)  =  oo  and  relation 
(10.1.3)  does  not  hold. 

Obtaining  an  analogue  of  Theorem  10.1.1  for  the  second  version  U  (t)  of  the 
renewal  function  in  the  case  of  uniformly  integrable  Xj  taking  values  of  both  signs  is 
accompanied  by  greater  technical  difficulties  and  additional  conditions.  For  a  fixed 
s  >  0,  split  the  series  U(t)  =  —  0  into  the  three  parts 


n< 


t(  1-e) 

a 


e3=  e 


By  the  law  of  large  numbers  (see  Corollary  8.3.2), 


Therefore,  for  n  <  bffjf  ? 


Tn  p 


na 


1  —  £ 


1 


and  hence 


1  —  £ 
a 


The  second  sum  allows  the  trivial  bound 


1 

t 


< 


2s 

a 


where  the  right-hand  side  can  be  made  arbitrarily  small  by  the  choice  of  £. 

The  main  difficulties  are  related  to  estimating  ^3.  To  illustrate  the  problems 

arising  here  we  confine  ourselves  to  the  case  of  identically  distributed  xj  =  r.  In 
this  case  the  required  estimate  for  can  only  be  obtained  under  the  condition 
E(r“)  2  <  oo,  x  :=  max(0,  — r).  Assume  without  losing  generality  that  Er2<oo. 
(If  E(t+)2  =  oo,  r+  :=  max(0,  r),  then  introducing  truncated  random  variables 
x )  =  min  (.s',  x /),  we  obtain,  using  obvious  conventions  concerning  notations,  that 

Y(Tn  <t)<  P(ri"}  <  t ),  U  (, t )  <  U^s\t)  and  where  E(r^)2  <  oo  and 

the  value  of  Er(v)  can  be  made  arbitrarily  close  to  a  by  the  choice  of  s.)  In  the 
case  Et2  <  oo  we  can  use  Theorem  9.5.1  by  virtue  of  which,  for  a  regularly  vary¬ 
ing  left  tail  W(t)  =  P(r  <  —t)  =  t  *L(t)  ( Lit )  is  a  slowly  varying  function)  and 
n>'-  (1  +  s),  we  have 

P (Tn  <t)  =  P (Tn  —  an  <  —(an  —  0)  ~  nW (an  —  t). 


By  the  properties  of  slowly  varying  functions  (see  Appendix  6),  for  the  values 
u=n/t  comparable  to  1  ,n>  *-  (1  +  s)  and  t  — >►  oo,  we  have 

ti 


W  (an  —  t) 


au 


-  1 


W(£t) 


£ 


284 


10  Renewal  Processes 


Thus  for  >  2,  as  t  — >►  oo, 


E3  =  E  V(Tn<t)~[ 


( I  ~l~£)/ 


vW(av  —  t)dv 


a 


tZW(£t)  I  U 


poo 

J  1±£ 

a 


au  —  1 1  ^ 


~  c(e)t~W(t)  =  <?(1). 


Summarising,  we  have  obtained  that 

r  f/(0  1 

lim  - =  — . 

^oo  t  a 

Now  if  E(t-)  2  =  oo  then  U{t)  =  oo  for  all  t.  In  this  case,  instead  of  U  ( t )  one 
studies  the  “local”  renewal  function 


U(t,h)  =  J2p(Tn  e(t,t  +  h]) 

n 

which  is  always  finite  provided  that  a  >  0  and  has  all  the  properties  of  the  increment 
H(t  +  h)  —  H(t)  to  be  studied  below  (see  e.g.  [12]). 

In  view  of  the  foregoing  and  since  the  function  H(t)  will  be  of  principal  interest 
to  us,  in  what  follows  we  will  restrict  ourselves  to  studying  the  first  version  of  the 
renewal  function,  as  was  noted  above.  We  will  mainly  pay  attention  to  the  asymp¬ 
totic  behaviour  of  the  increments  H (t  +  h)  —  H(t)  as  t  — >  oo.  To  this  is  closely 
related  a  more  general  problem  that  often  appears  in  applications:  the  problem  on 
the  asymptotic  behaviour  as  t  — >  oo  of  integrals  (see  e.g.  Chap.  13) 

1  g(t-y)dH(y)  (10.1.9) 

for  functions  g(v)  such  that 

oo 

g(n)  dv  <  oo. 

Theorems  describing  the  asymptotic  behaviour  of  (10.1.9)  will  be  called  the  key 
renewal  theorems.  The  next  sections  and  Appendix  9  will  be  devoted  to  these  theo¬ 
rems.  Due  to  the  technical  complexity  of  the  mentioned  problems,  we  will  confine 
ourselves  to  considering  only  the  case  where  rj,  j  >  2,  are  identically  distributed. 

Note  that  in  some  special  cases  the  above  problems  can  be  solved  in  a  very  simple 
way,  since  the  renewal  function  H(t)  can  be  found  there  explicitly.  To  do  this,  as  it 
follows  from  Wald’s  identity  used  above,  it  suffices  to  find  Ex  it)  in  explicit  form. 
If,  for  instance,  zj  are  integer- valued,  P (zj  =  1)  >  0  and  P (zj  >  2)  =  0,  for  all 
j  >  1,  then  x(0  =  1  and  Wald’s  identity  yields  H(t)  =  (t  +  1  )/a.  Similar  equalities 
will  hold  if  P (Zj  >  t)  =  ce~yt  for  t  >  0  and  y  >  0  (if  Zj  are  integer- valued,  then  t 
takes  only  integer  values  in  this  formula).  In  that  case  the  distribution  of  x(0  will 
be  exponential  and  will  not  depend  on  t  (for  more  details,  see  the  exposition  below 
and  also  Chap.  15). 
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10.2  The  Key  Renewal  Theorem  in  the  Arithmetic  Case 

We  will  distinguish  between  two  distribution  types  for  xj :  arithmetic  in  an  extended 
sense  (when  the  lattice  span  is  not  necessary  1 ;  for  the  definition  of  arithmetic  distri¬ 
butions  see  Sect.  7.1)  and  all  other  distributions  that  we  will  call  non- arithmetic.  It  is 
clear  that,  say,  a  random  variable  taking  values  1  and  \fl  with  positive  probabilities 
cannot  be  arithmetic. 

In  the  present  section,  we  will  consider  the  arithmetic  case.  Without  loss  of  gener¬ 
ality,  we  will  assume  that  the  lattice  span  is  1 .  Then  the  functions  P(r j  <  t)  and  H{t) 
will  be  completely  determined  by  their  values  at  integer  points  t  =  k,k  =  0,  1,2.... 

First  we  consider  the  case  where  the  Xj  are  positive ,  Xj  =  r  for  j  >  2.  In  that 
case,  the  difference 

oo 

h(k )  :=  H(k)  -  H(k-  1)  =  ^]P (Tj=k),  k>  1, 

j=  o 

is  equal  to  the  expectation  of  the  number  of  visits  of  the  point  k  by  the  walk  {77}. 
Put 


qk  :=  P(ri  =  k),  pk  :=  P(r  =  k). 


Definition  10.2.1  A  renewal  process  ij(t)  will  be  called  homogeneous  and  denoted 
by  770  (0  if 


k  —  1 ,  2 , . . . , 


a  =  Er, 


(10.2.1) 


If  we  denote  by  p(z)  the  generating  function 

oo 

p(z)  =EzT  =  5>z‘, 

k=  1 

then  the  generating  function  q(z)  =  E z^1  =  Q kZ k  will  be  equal  to 


1  00  oo 

^)=-xyx> 


k=  1  j=k 


z(  1  -  p(z)) 
a(  1  -  z) 


As  we  will  see  below,  the  term  “homogeneous”  for  the  process  r]o(t)  is  quite  justi¬ 
fied.  One  of  the  reasons  for  its  use  is  the  following  exact  (non-asymptotic)  equality. 


Theorem  10.2.1  For  a  homogeneous  renewal  process  r]o(t),  one  has 

Hoik)  :=  Er/oik)  =  1  +  -. 

a 


Proof  Consider  the  generating  function  r(z)  for  the  sequence  hoik)  =  Hoik) 
Hoik-  1): 
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oo 


oo  oo 


r(z)  =  ^2zkho(k)  =  ^^VP(7)  =  k) 

1  j=lk=l 


OO  OO 

=  ^TEzTj  =q(z)'^2pi(z) 

7  =  1  7=0 


q(z) 
l  -  p(z) 


z 

a(  1  -  -) ' 


This  implies  that  ho(k)  =  l/a.  Since  Hq(0)  =  1,  one  has  Ho(k)  =  1  +  k/a.  The 
theorem  is  proved.  □ 


Sometimes  the  process  770 (0  is  also  called  stationary.  As  we  will  see  below, 
it  would  be  more  appropriate  to  call  it  a  process  with  stationary  increments  (see 
Sect.  22.1). 

The  asymptotic  regular  behaviour  of  the  function  h(k)  as  k  — >  00  persists  in  the 
case  of  arbitrary  x\  as  well. 

Denote  by  d  the  greatest  common  divisor  (g.c.d.)  of  the  possible  values  of  x: 

d  :=  g.c.d. {k  :  pk  >  0}, 

and  let  g(k),  k  =  0,  1 , . . . ,  be  an  arbitrary  sequence  such  that 

00 

X>(*>l  <  °°- 

k=  0 


Theorem  10.2.2  (The  key  renewal  theorem)  If  d  =  1 ,  x\  is  an  arbitrary  integer¬ 
valued  random  variable  and  Xj  =  x  >  0  for  j  >  2,  then ,  as  k  — >►  00, 


h(k)  :=  H(k)  -  H(k 


J2h(Dg(k-l) 


1  =  1 


1 

a 


00 

m=  0 


These  two  relations  are  equivalent. 

The  first  assertion  of  the  theorem  is  also  called  the  local  renewal  theorem. 
To  prove  the  theorem  we  will  need  two  auxiliary  assertions. 


Lemma  10.2.1  Let  all  Xj  be  identically  distributed  and  v  >  1  be  a  Markov 
time  with  respect  to  the  collection  of  a -algebras  {Tn},  where  is  independent 
of  o{xn+\,  xn+2,  •  •  •)•  Then  the  a -algebra  generated  by  the  random  variables  v, 
Ti, . . . ,  rv,  and  the  a -algebra  <t{tv+ 1,  rv+2,  •  •  •}  cire  independent.  The  sequence 
{tv+i,  tv+2,  •  •  •}  has  the  same  distribution  as  {ti,  T2, . . .}. 


Thus,  in  spite  of  their  random  numbers,  the  elements  of  the  sequence  x v+j  are 
distributed  as  xj . 

Proof  For  given  Borel  sets  B\ ,  B2, . . . ,  C\ ,  C2, . . .  put 

A  :=  {v  g  N,  x\  e  B\, . . . ,  xv  e  Bv },  Dv  :=  {tv+i  e  C 1, . . . ,  Tv+jfc  E  C ^}, 
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where  N  is  a  given  set  of  integers  and  k  is  arbitrary.  Since  P (Dj)  =  P(£>o)  and  the 
events  Dj  and  {v  =  j]  are  independent,  the  total  probability  formula  yields 

oo  oo 

p (Dv)  =  £P(v  =  j ’  =  J2p(v  =  j)pic>j)  =  P(Do)- 

7  = 1  7  =  1 

Therefore,  by  Theorem  3.4.3,  in  order  to  prove  the  required  independence  of  the 
a -algebras,  it  suffices  to  show  that  P (DVA)  =  P(Do)P(A). 

By  the  total  probability  formula, 

p  (DVA)  =  £>(D„A{v  =  7'})  =  £>(0,A{y  =  ;}). 

je N  je N 

But  the  event  A{v  =  j]  belongs  to  Ty,  whereas  Dj  e  cr(ry+i, . . . ,  tj+k).  Therefore 
Dj  and  A{v  =  j]  are  independent  events  and 

P {Dj  A{v  =  j})  =  P(Dj)P(A{v  =  j})  =  P(D0)P(A{v  =  j}),  j  >  1. 

From  here  it  clearly  follows  that  P (DVA)  =  P(Do)P(A).  The  lemma  is  proved.  □ 

Lemma  10.2.2  Let  4Ti ,  4T2»  -  •  -  be  independent  arithmetic  identically  and  symmet¬ 
rically  distributed  random  variables  with  zero  expectation  Efy  =  0.  Put  Zn  := 
YTj=\  0*  Then,  for  any  integer  k , 

Vk  :=  min{^z :  Zn  =  k } 

is  a  proper  random  variable :  P(v^  <  oo)  =  1. 


The  proof  of  the  lemma  is  given  in  Sect.  13.3  (see  Corollary  13.3.1). 


Proof  of  Theorem  10.2.2  Consider  two  independent  sequences  of  random  vari¬ 
ables  (we  assume  that  they  are  given  on  a  common  probability  space):  a  sequence 
Ti,  T2, . . . ,  where  T\  has  an  arbitrary  distribution,  and  a  sequence  r|,  r^, . . . ,  where 
P(r'i  =k)  =  qk  (see  (10.2.1)),  and  P  (r'y  =  k)  =  P  (tj  =  k)  =  pk  for  j  >2  (so  that 

x’j  =  Tj  for  j  >  2;  the  process  f  (t)  constructed  from  the  sums  7^  =  5=i  rj is 

homogeneous  (see  Definition  10.2.1)). 

Set  v  :=  min {n  >  1  :  Tn  =  Tf[ } .  It  is  clearly  a  Markov  time  with  respect  to  the 

sequence  {r 7 ,  t'  }.  We  show  that  P(v  <  oo)  =  1.  Put 

J  j 


Then 


Zn  :=  —  Tj )  for  n>2,  Z\  :=  0,  Zo  :=  t\  —  r[. 

7=2 


v  =  min{/?  >  1  :  Zn  =  —  Zo}. 

By  Lemma  10.2.2  (£j  =  r;-  —  rj  have  a  symmetric  distribution  for  j  >  2),  for  each 
integer  k  the  variable  =  min{/i  >  1  :  Zn  =  k}  is  proper.  Since  Zn  for  n  >  1  and 
Zj  are  independent,  we  have 

P(v  <  oo)  =  J]P(Z0  =  —k)V(vk  <  oo)  =  1. 

k 
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Now  we  will  “glue  together”  (“couple”)  the  sequences  {7^ }  and  {Tf}.  Since 
Tv  =  T'v  and  v  is  a  Markov  time,  by  Lemma  10.2.1  one  can  replace  tv+i,  tv+2,  •  •  • 
with  r'+1,  r'+2, . . .  (and  thereby  replace  Tv+\ ,  Tv+2  with  rv'+1,  r'  2, . . .)  without 
changing  the  distribution  of  the  sequence  {Tn}. 

Therefore,  on  the  set  {Tv  <  k}  one  has  rj(t)  =  for  t  >  k  —  1  and  hence 


h(k)  =  E(ri(k)  -  r](k  -  1)) 

=  E [r,\k)  -  rj\k  -  1);  Tv  <  k]  +  E [r,(k)  -  r](k  -  1);  Tv  >  k] 

=  1  -  E[r]\k)  -  r)\k  -  1);  Tv  >  k]  +  E [V(k)  -  n(k  -  1);  Tv  >  k]. 
Since  \r](k)  —  r](k  —  1)|  <  1,  we  have 


h(k) 


<  P (Tv  >k)^0 


a 

as  k  — >►  oo.  The  first  assertion  of  Theorem  10.2.2  is  proved. 
Since  h(k)  <  1,  we  can  make  the  value  of 


k-N 

J2h(l)g(k-l ) 
1  =  1 


k— 1  oo 

<  E  k#)|<  E 

/=v+i  z=#+i 


arbitrarily  small  by  choosing  an  appropriate  N.  Moreover,  by  virtue  of  the  first  as¬ 
sertion,  for  any  fixed  N, 


k  i  N 

E  h(l)g(k-l)^>  “Eg(^  as^ 

l=k—N  1=0 


OO. 


This  implies  the  second  assertion  of  the  theorem. 


□ 


Remark  10.2.1  The  coupling  of  {Tn}  and  {7^}  in  the  proof  of  Theorem  10.2.2  could 
be  done  earlier,  at  the  time  y  :=  min {n  >  1  :  Tn  e  T'},  where  T!  is  the  set  of  points 

r  =  {T[,T',...}. 

Theorem  10.2.3  The  assertion  of  Theorem  10.2.2  remains  true  for  arbitrary  ( as¬ 
suming  values  of  different  signs)  Xj. 

Proof  We  will  reduce  the  problem  to  the  case  xj  >  0.  First  let  all  xj  be  identi¬ 
cally  distributed.  Consider  the  random  variable  xi  =  x(0)  that  we  will  call  the  first 
positive  sum.  We  will  show  in  Chap.  12  (see  Corollary  12.2.3)  that  E/i  <  oo  if 
a  =  E Xj  <  oo.  According  to  Lemma  10.2.1,  the  sequence  r^(0)+i,  r??(0)+2?  •  •  •  will 
have  the  same  distribution  as  x\,  X2, . . . .  Therefore  the  “second  positive  sum”  X2  or, 
which  is  the  same,  the  first  positive  sum  of  the  variables  r^(o)+i,  tVj(0)+2,  •  •  •  will 
have  the  same  distribution  as  xi  and  will  be  independent  of  it.  The  same  will  be  true 

for  the  subsequent  “overshoots”  over  the  already  achieved  levels  xi,  Xi  +  X2> _ 

Now  consider  the  random  walk 

k  00 

'  Hk  =  E  Xi  ■ 

1=1  J  k= 1 
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and  put 

r]*(t)  :=min{fc  :  Hk  >  t},  X*(t)  :=  H,,*^  - 1,  H* (t)  :=  Erj* (t) . 

Since  Xk  >  0,  Theorem  10.2.2  is  applicable,  and  therefore  by  Wald’s  identity 

H*{k)  -  H*(k  -  1)  =  d-(l  +EX*(k)  ~  EX*(k  ~  D)  =L, 

Ext  Ext 

Ex*(*)-EX*(*-l)-»0. 

Note  now  that  the  distributions  of  the  random  variables  x(0  (see  Definition  10.1.3) 
and  x*(0  coincide.  Therefore 

H(k)  -  H(k  -  1)  =  1(1  +EX(k)  -  E X(k-  1)) 

a 

—  i(l  +  EX*(k)  —  EX*(k  —  1))  — »• 
a  a 

Now  let  the  distributions  of  x\  and  D’  J  —  be  different.  Then  the  renewal 
function  H\  ( t )  for  such  a  walk  will  be  equal  to 

k  k 

H\ (&)  =  !+  J2  P(n=i)[H(k-i)+l]  =  l+  J2  P(ri  =i)H(k-i), 

i=—o O  i=—o O 

k 

h\{k)  =  H\(k)  -  Hi (k  -  1)  =  ^  P(n  =  i)h(k  -  i),  k>  0, 

i=—o o 

where  H\(—  1)  =  0,  h( 0)  =  H( 0)  and  the  function  H(t )  corresponds  to  identically 
distributed  Xj.  If  we  had  h{k)  <  c  <  oo  for  all  k ,  that  would  imply  convergence 
h  i  (k)  — >  1  la  and  thus  complete  the  proof  of  the  theorem. 

The  required  inequality  h(k)  <  c  actually  follows  from  the  following  gen¬ 
eral  proposition  which  is  true  for  arbitrary  (not  necessarily  lattice)  random  vari¬ 
ables  r  j .  □ 

Lemma  10.2.3  If  all  xj  are  identically  distributed  then,  for  all  t  and  u , 

H(t  +  u)  —  H(t)  <  H(u)  <  c\  +  C2U. 

rs*> 

Proof  The  difference  rj(t  +  u)  —  r/(t )  is  the  number  of  jumps  of  the  trajectory  {7^} 
that  started  at  the  point  t  +  /  (0  >  t  until  the  first  passage  of  the  level  t  +  u,  where 
the  sequence  {7^}  has  the  same  distribution  as  {7^}  and  is  independent  of  it  (see 
Lemma  10.2.1).  In  other  words,  rj(t  +  u)  —  r)(u)  has  the  same  distribution  as  rj(t  — 
x(0)  <  rj(t ),  where  rj  corresponds  to  {7^}  if  x(0  —  u  an(i  1°  h(t  +  u)  —  rj(t)  =  0 
if  X(0  >  u.  Therefore  77 (t  +  u)  —  H(t)  <  H(u).  The  inequality  for  77 (u)  follows 
from  Theorem  10.2.1.  The  lemma  is  proved.  □ 


Theorem  10.2.3  is  proved. 


□ 
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10.3  The  Excess  and  Defect  of  a  Random  Walk.  Their  Limiting 
Distribution  in  the  Arithmetic  Case 

Along  with  the  excess  x(0  =  ^(0  —  t  we  introduce  one  more  random  variable 
closely  related  to  x(0- 

Definition  10.3.1  The  random  variable 

y(0  -=  t  Trj(t)-l  =t  Ty(^) 

is  called  the  defect  (or  undershoot )  of  the  level  t  in  the  walk  {Tn}. 

The  quantity  x  (0  may  be  thought  of  as  the  time  during  which  the  component 
that  was  working  at  time  t  will  continue  working  after  that  time,  while  y(t)  is  the 
time  for  which  the  component  has  already  been  working  by  that  time. 

One  should  not  think  that  the  sum  x  (0  +  Y  (0  has  the  same  distribution  as  xj — 
this  sum  is  actually  equal  to  the  value  of  a  r  with  the  random  subscript  rj(t).  In 
particular,  as  we  will  see  below,  it  may  turn  out  that  E x(0  >  E xj  for  large  t.  The 
following  apparent  paradox  is  related  to  this  fact.  A  passenger  coming  to  a  bus  stop 
at  which  buses  arrive  with  inter-arrival  times  x\  >  0,  X2  >  0, . . .  (Ex j  =  a),  will  wait 
for  the  arrival  of  the  next  bus  for  a  random  time  x  of  which  the  mean  Ex  could 
prove  to  be  greater  than  a. 

One  of  the  principal  facts  of  renewal  theory  is  the  assertion  that,  under  broad 
assumptions,  the  joint  distribution  of  x(0  and  y(t)  has  a  limit  as  t  — >  oo,  so  that 
for  large  t  the  distribution  of  x(0  does  not  depend  on  t  any  more  and  becomes 
stationary.  Denote  this  limiting  distribution  of  x  (0  by  G  and  its  distribution  function 
by  G: 

G(x )  =  lim  P(x(0  <  x).  (10.3.1) 

If  we  take  the  distribution  of  x\  to  be  G  then,  for  such  process,  by  its  very  construc¬ 
tion  the  distribution  of  the  variable  x(0  will  be  independent  of  t.  Indeed,  in  that 
case  we  can  think  of  the  positive  elements  of  {Tj}  as  the  renewal  times  for  a  process 
which  is  constructed  from  the  sequence  {xj}  and  of  which  the  start  is  shifted  to  a 
point  —A,  where  A  is  very  large.  Since  by  virtue  of  (10.3.1)  we  can  assume  that 
the  distributions  of  x  (A)  and  x  (A  + 1)  coincide  with  each  other,  the  distribution  of 
the  variable  x(0  (which  can  be  identified  with  x(A  +  t))  is  independent  of  t  and 
coincides  with  that  of  x\ .  A  formal  proof  of  this  fact  is  omitted,  since  it  will  not  be 
used  in  what  follows.  However,  the  reader  could  carry  it  out  using  the  explicit  form 
of  G(x)  from  (10.3.1)  to  be  derived  below. 

In  the  arithmetic  case,  the  distribution  G  is  just  the  law  (10.2.1)  used  to  construct 
the  homogeneous  renewal  process  770  (0-  We  will  prove  this  in  our  next  theorem. 

It  follows  from  the  fact  that,  for  the  process  770  (0>  the  distribution  of  x(0  does 
not  depend  on  t  and  coincides  with  that  of  x\,  that  the  distribution  of  rjo(t  +  u)  — 
77o(0  coincides  with  that  of  770 (u)  and  hence  is  also  independent  of  t.  It  is  this 
property  that  establishes  the  stationarity  of  the  increments  of  the  renewal  process; 
we  called  this  property  homogeneity.  It  means  that  the  distribution  of  the  number  of 
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renewals  over  a  time  interval  of  length  u  does  not  depend  on  when  we  start  counting, 
and  therefore  depends  on  u  only. 

Theorems  on  the  limiting  distribution  of  x(0  and  y(t)  are  of  interest  not  only 
from  the  point  of  view  of  their  applications.  We  will  need  them  for  a  variety  of  other 
problems.  Again  we  consider  first  the  case  when  the  variables  Xj  >  0  are  arithmetic. 
In  that  case  the  “time”  can  also  be  assumed  discrete  and  we  will  denote  it,  as  before, 

by  the  letters  n  and  k.  Let,  as  before,  xj  =  r  for  j  >2  and  pk  =  P(r  =  k). 


Theorem  10.3.1  Let  the  random  variable  x  >  0  be  arithmetic ,  Er  =  a  exist ,  x\  be 
an  arbitrary  integer  random  variable ,  and  the  g.c.d.  of  the  possible  values  of  x  be 
equal  to  1.  Then  the  following  limit  exists 

lim  P (y{k)  =  i,  X(k)  =  j)=*±L,  i  >  0,  j  >  0.  (10.3.2) 

k->oo  a 


It  follows  from  Theorem  10.3.1  that 

lim  P(x(fc)  =  i )  = 

k^oo 


lim  P (y(k)  =  /  )  = 


.  oo 

-flpj’  1 

a  *—• 


>0; 


J=i 
oo 


1 

-  V  pj,  j  >o. 

a  ^ 


j=i+ 1 


(10.3.3) 


Proof  of  Theorem  10.3.1  By  the  renewal  theorem  (see  Theorem  10.2.2),  for  k  >  i. 


oo 

p (y(k)  =  i,  x(k)  =  j )  =  J2P(T‘  =k-U  T/+1  =  i  +  j ) 

i=i 

OO 

=  y>(7) = *  -  o p(t = i + j ) = h(k 

i=i 


i)Pi+j 


Pi+l 

a 


as  k  — >  oo.  The  theorem  is  proved.  □ 

IfEr2  =  m2  <  oo,  then  Theorem  10.3.1  allows  a  refinement  of  Theorem  10.2.2 
(see  Theorem  10.3.2  below). 


Corollary  10.3.1  If  m2  <  oo,  then  the  random  variables  x(k)  are  uniformly  inte- 
grable  and 

.  oo  oo 

1  x — ^  x — >  rn2  +  a 

Ex(^)  —  /  i  /  Pj  = -  as  k  — >  oo.  (10.3.4) 

i=0  j=i 

Proof  The  uniform  integrability  follows  from  the  inequalities  hfk)  <  1, 

k  oo 

p (x(k)  =  j)  =  y2h(k  -  i^Pi+j  - 

1=0  i=j 


This  implies  (10.3.4)  (see  Sect.  6.1). 


□ 
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Now  we  can  state  a  refined  version  of  the  integral  theorem  that  implies  Theo¬ 
rem  10.2.2. 


Theorem  10.3.2  If  all  x  /  are  identically  distributed  and  Er2  =  m2  <  00,  then 


k  mo  +  a 

H (k)  =  — I - — -y - f  o(l) 

a  2  az 


as  k  —>  00. 


The  Proof  immediately  follows  from  the  Wald  identity 


H(k)=Erj(k)  = 


k  +  Ex  (k) 
a 


and  Corollary  10.3.1. 


□ 


Remark  10.3.1  For  the  process  77*  (t)  corresponding  to  nonzero  times  rj  required 
for  components’  renewals  (mentioned  in  Remark  10.1.1),  the  reader  can  easily  find, 
similarly  to  Theorem  10.3.1,  not  only  the  asymptotic  value  pi+j /a*  of  the  proba¬ 
bility  that  at  time  k  — >  00  the  current  component  has  already  worked  for  time  i  and 
will  still  work  for  time  j ,  but  also  the  asymptotics  of  the  probability  that  the  com¬ 
ponent  has  been  “under  repair”  for  time  i  and  will  stay  in  that  state  for  time  j,  that 

is  given  by  pfJa*,  where  p\  —  P(r'.  =  /),  <2*  =  E(r /  +  x'f)  =  Er*. 

1  1  j  J  j  j 


Now  consider  the  question  of  under  what  circumstances  the  distribution  of  the 
random  variable  x\  for  the  homogeneous  process  (i.e.  the  distribution  of  what  one 
could  denote  by  x  (°°))  will  coincide  with  that  of  Xj  for  j  >  2.  Such  a  coincidence 
is  equivalent  to  the  equality 


00 

Pi  =  -fl  pj 


j=i 


for  /  =  1, 2, 
a(pi 


,  or,  which  is  the  same,  to 


a  —  1 

Pi~i)  =  -Pi-u  p,  =  — Pi-i, 


1  fa  —  1 


a 


Pi  = 


a  —  1 


a 


This  means  that  the  renewal  process  generated  by  the  sequence  of  independent  iden¬ 
tically  distributed  random  variables  x\,  X2, . . .  is  homogeneous  if  and  only  if  xj  (or, 
more  precisely,  r;-_  1)  have  the  geometric  distribution. 

Denote  by  y  and  x  the  random  variables  having  distribution  (10.3.2).  Using 
(10.3.1),  it  is  not  hard  to  show  that  y  and  x  are  independent  also  only  in  the  case 
when  Tj,  j  >2,  have  the  geometric  distribution.  When  all  J  >  1,  have  such  a 

distribution,  y  (n)  and  x  («)  are  also  independent,  and  x  (n)  =  U  •  These  facts  can  be 
proved  in  exactly  the  same  way  as  for  the  exponential  distribution  (see  Sect.  10.4). 

We  now  return  to  the  general  case  and  recall  that  if  Er2  <  00  then  (see  Corol¬ 
lary  10.3.1) 
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This  means,  in  particular,  that  if  the  distribution  of  r  is  such  that  Er2  >  2 a2  —  a , 
then,  for  large  n ,  the  excess  mean  value  Ex  in)  will  become  greater  than  Er  =  a. 


10.4  The  Renewal  Theorem  and  the  Limiting  Behaviour 
of  the  Excess  and  Defect  in  the  Non-arithmetic  Case 


Recall  that  in  this  chapter  by  the  non-arithmetic  case  we  mean  that  there  exists  no 
h  >  0  such  that  P(|J^{r  =  kh})  =  1,  where  k  runs  over  all  integers.  To  state  the 
key  renewal  theorem  in  that  case,  we  will  need  the  notion  of  a  directly  integrable 
function. 


Definition  10.4.1  A  function  g(u)  defined  on  [0,  oo)  is  said  to  be  directly  integrable 
if: 

(1)  the  function  g  is  Riemann  integrable  over  any  finite  interval  [0,  A];  and 

(2)  J2kS(k)  <  00,  where  gk  =  maxk<u<k+i  \g(u)\. 

It  is  evident  that  any  monotonically  decreasing  function  g(t)  |  0  having  a  finite 
Lebesgue  integral 

poo 

/  g(t)dt  <  00 

Jo 

is  directly  integrable.  This  also  holds  for  differences  of  such  functions. 

The  notion  of  directly  integrable  functions  introduced  in  [12]  differs  somewhat 
from  the  one  just  defined,  although  it  essentially  coincides  with  it.  It  will  be  more 
convenient  for  us  to  use  Definition  10.4.1,  since  it  allows  us  to  simplify  to  some 
extent  the  exposition  and  to  avoid  auxiliary  arguments  (see  Appendix  9). 


Theorem  10.4.1  (The  key  renewal  theorem)  Let  x j  =  x  >  0  for  j  >2  and  g  be  a 
directly  integrable  function.  If  the  random  variable  x  is  non- arithmetic,  there  exists 
Er  =  a  >  0,  and  the  distribution  of  x\  is  arbitrary ,  then ,  as  t  ^  oo. 


g(t  —  u)  dH(u) 


-f 

a  Jo 


oo 


g(u)  du. 


(10.4.1) 


There  is  a  measure  H  on  [0,  oo)  associated  with  H  that  is  defined  by  H((v,  y])  := 
H(y)  —  H(x).  The  integral 


g(t  —  u)dH(u) 


!That  is,  the  sums  n~l  k  and  n~l  flk  8k  have  the  same  limits  as  n  — >►  oo,  where  g k  = 
minueAk  g(u),  ~gk  =  maxMG/\A  g(u),  Ak  =  [kA,  (k  +  1  )A),  and  A  =  N/n.  The  usual  definition 
of  Riemann  integrability  over  [0,  oo)  assumes  that  condition  (1)  of  Definition  10.4.1  is  satisfied 

and  the  limit  of  g(u)du  as  N  — >►  oo  exists.  This  approach  covers  a  wider  class  of  functions 
than  in  Definition  10.4.1,  allowing,  for  example,  the  existence  of  a  sequence  tk  — >  oo  such  that 
g(tk)  ->  oo. 
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in  (10.4.1)  can  also  be  written  as 

/  g(t  —  u)  H(du). 

Jo 

It  follows  from  (10.4.1),  in  particular,  that,  for  any  fixed  w, 

H{t)  -  H{t  -«)-►-.  (10.4.2) 

a 

It  is  not  hard  to  see  that  this  relation,  which  is  called  the  local  renewal  theorem ,  is 
equivalent  to  (10.4.1). 

The  proof  of  Theorem  10.4.1  is  technically  rather  difficult,  so  we  have  placed 
it  in  Appendix  9.  One  can  also  find  there  refinements  of  Theorem  10.4.1  and  its 
analogue  in  the  case  where  r  has  a  density. 

The  other  assertions  of  Sects.  10.2  and  10.3  can  also  be  extended  to  the  non¬ 
arithmetic  case  without  any  difficulties.  Let  all  xj  be  nonnegative. 


Definition  10.4.2  In  the  non-arithmetic  case,  a  renewal  process  77  (t)  is  called  ho¬ 
mogeneous  (and  is  denoted  by  770  (0)  if  the  distribution  of  the  first  jump  has  the 
form 

1 

P(ri  >  x)  =  —  /  P(r  >  t)  dt. 

a  Jx 

The  ch.f.  of  x\  equals 

1  C°° 

<pT.  (X)  :=  Ee,Xt]  =  -  /  e,XxP(z  >x)dx. 

a  Jo 

Since  here  we  are  integrating  over  v  >  0,  the  integral  exists  (as  well  as  the  func¬ 
tion  cp(\)  =  (pTik)  :=  EelXr)  for  all  A  with  ImA  >  0  (for  A  =  ia  +  v,  —00  <  v  <  00, 
a  >  0,  the  factor  elXx  is  equal  to  e~axelvx ;  see  property  6  of  ch.f.s).  Therefore,  for 
ImA  >  0, 


<Pn  M  = 


<pQJ)  ~  l 

ika 


Theorem  10.4.2  For  a  homogeneous  renewal  process , 

H0(t)  =  E 770(0  =  1  +  -,  t  >  0. 

a 

Proof  This  theorem  can  be  proved  in  the  same  way  as  Theorem  10.2.1.  Consider 
the  Fourier-Stieltjes  transform  of  the  function  Ho(t ): 

poo 

r( A)  :=  /  elXx  dHo(x). 

Jo 

Note  that  this  transform  exists  for  ImA  >  0  and  the  uniqueness  theorem  established 
for  ch.f.s  remains  true  for  it,  since  <p*(v)  :=  r(ia  +  v)/r(ia),  —  00  <  v  <  00  (we  put 
A  =  ia  +  v  for  a  fixed  a  >  0)  can  be  considered  as  the  ch.f.  of  a  certain  distribution 
being  the  “Cramer  transform”  (see  Chap.  9)  of  Ho(t). 
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Since  r,-  >0,  one  has 


oo 


H0(x)  =  J2?(Tj  <*)• 

7=0 

As  Ho(t)  has  a  unit  jump  at  t  =  0,  we  obtain 

roc  00 

r(A)=  /  elXx  dHp(x)  =  1  +  <^Tl  (X)cpj  (A)  =  1  + 

70  7=0 


<p(A)  -  1  1 


iXa  1  —  <^(A) 


=  1  - 


1 


/A.  <2 


It  is  evident  that  this  transform  corresponds  to  the  function  Ho(t)  =  1  +  t/ a.  The 
theorem  is  proved.  □ 

In  the  non-arithmetic  case,  one  has  the  same  connections  between  the  homoge¬ 
neous  renewal  process  770  (0  and  the  limiting  distribution  of  /(O  and  y(t)  as  we 
had  in  the  arithmetic  case.  In  the  same  way  as  in  Sect.  10.3,  we  can  derive  from  the 
renewal  theorem  the  following. 

Theorem  10.4.3  If  x  >  0  is  non- arithmetic,  Er  =  a,  and  the  distribution  of  x\  >  0 
is  arbitrary ,  then  the  following  limit  exists 

lim  P (y(t)>u,x(t)>v)  =  -l  P(r  >x)dx.  (10.4.3) 

^°°  a  Ju+V 


Proof  For  t  >  u,  by  the  total  probability  formula, 
P (y(t)  >  u,  x (?)  >  v) 


nt—u 

=  P(t,  >  f  +  v)  +  V"'  /  P (77(f)  =  j  +  1,  Tj  e  dx,  y(t)  >  u,x(t )  >  v) 

7=1 70 

00  nt—u 

=  P(n  >t  +  v)  +  J2  P(Tj  e  dx ,  Tj+i  >  t  —  v  +  v) 

7  =  1  ^ 


=  P(ti  >  t  +  v)  —  P(t  >  t  +  v)  + 


pt—u 

Jo 


dH(x)P(x  >  t  —  x  +  f).  (10.4.4) 


Here  the  first  two  summands  on  the  right-hand  side  converge  to  0  as  t  ->  00.  By 
the  renewal  theorem  for  g(v)  =  P(r  >  v  +  u  +  v)  (see  (10.4.1)),  the  last  integral 
converges  to 

1 

-  /  P(t  >  v  +  u  +  v)  dx. 

d  Jo 


The  theorem  is  proved. 


□ 


As  was  the  case  in  the  previous  section  (see  Theorem  10.3.2),  in  the  case 
Er2  =  m2  <  00  Theorem  10.4.3  allows  us  to  refine  the  key  renewal  theorem. 
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d  9 

Theorem  10.4.4  If  all  xj  =  x  >  0  are  identically  distributed  and  Er  =  m2  <  00, 
then ,  as  t  ^  00, 

t  m2 

H(t)  =  — |-  t  2  +°(1)- 
a  2  az 


Proof  From  (10.4.4)  for  u  =  0  and  Lemma  10.2.3  it  follows  that  x  (0  are  uniformly 
integrable,  for 

P(x(0  >  v)  =  [  dH(x)  P(r  >  t  —  x  +  u)  <  (ci  +  cf)  y^P(r  >  k  +  v), 

JO 


(10.4.5) 


and  therefore  by  (4.4.3) 


1  m2 

Ex(0  — >  -  /  /  P(t  >  u)dudv  ~  — 

a  Jo  Jv  2« 


It  remains  to  make  use  of  Wald’s  identity.  The  theorem  is  proved. 


(10.4.6) 

□ 


One  can  add  to  relation  (10.4.6)  that,  under  the  conditions  of  Theorem  10.4.4, 
one  has 


E  X2(t)=o(t) 

as  t  — >►  00.  Indeed,  (10.4.5)  and  Lemma  10.2.3  imply 


(10.4.7) 


P(x  0)  >v)  <  (ci  +  c2)  y~]  P(r  >  k  +  v)  <  c  f  P(r  >  z  +  v)  dz 


Further,  integrating  by  parts,  we  obtain 


•  00 


Ex2(0  =  -  [  V 2 dP(x (0  >  v) 
Jo 

•OO 


poo  pt  p  OO 

2  /  dP(x(0  >v)dv<2c  f  I  vP(x  >  z v)  dv  dz, 
Jo  Jo  Jo 


(10.4.8) 


where  the  inner  integral  converges  to  zero  as  z  — >  00: 


P  OC  j  p  OO  j 

I  vP(x  >  z  +  v)  dv  =  -  I  v2  dP(x  <  z  +  u)  <  -E(r2;  r  >  z)  — >►  0. 

Jo  2  Jo  2 

This  and  (10.4.8)  imply  (10.4.7). 

Note  also  that  if  only  Er  exists,  then,  by  Theorem  10.1.1,  we  have  Ex(0  =  0(0 
and,  by  Theorem  10.4.1  (or  10.4.3), 


■OO 


1  f 

P(x(0  >  u)  — >  —  /  P(t  >  w  +  r) 

a  Jo 
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Now  let,  as  before,  y  and  x  denote  random  variables  distributed  according  to  the 
limiting  distribution  (10.4.3).  Similarly  to  the  above,  it  is  not  hard  to  establish  that 
if  Exk  <  oo,  k  >  1,  then,  as  t  — >  oo, 

Ex^hO-^Ex*-1  <oo,  E  xk(t)  =  o(t). 


Further,  it  is  seen  from  Theorem  10.4.3  that  each  of  the  random  variables  y  and 
X  has  density  equal  to  a~l P(r  >  x).  The  joint  distribution  of  y  and  x  may  have  no 
density.  If  r  has  density  f(x)  then  there  exists  a  joint  density  of  y  and  x  equal  to 
a~l  f(x  +  y).  It  also  follows  from  Theorem  10.4.3  that  y  and  x  are  independent  if 
and  only  if 


1 


P(t  >  u)du  =  —e 

a 


—ax 


for  some  a  >  0,  i.e.  independence  takes  place  only  for  the  exponential  distribution 

t  a. 

Moreover,  for  homogeneous  renewal  processes  the  coincidence  of  P(ri  >  x) 
and  P(r  >  x)  takes  place  only  when  r  €=  Ta.  In  other  words,  the  renewal  pro¬ 
cess  generated  by  a  sequence  of  identically  distributed  random  variables  x\,  r, . . . 
will  be  homogeneous  if  and  only  if  xj  £=  Ta.  In  that  case  r]o(t)  is  called  (see  also 
Sect.  19.4)  a  Poisson  process.  This  is  because  for  such  a  process,  for  each  t ,  the 
variable  r) (t)  =  770 (0  has  the  Poisson  distribution  with  parameter  t/a. 

The  Poisson  process  has  some  other  remarkable  properties  as  well  (see  also 
Sect.  19.4).  Clearly,  one  has  x(0  €=  Ya  for  such  a  process,  and  moreover,  the  vari¬ 
ables  y(t)  and  x(0  are  independent.  Indeed,  by  (10.4.4),  taking  into  account  that 
H(x)  has  a  jump  of  magnitude  1  at  the  point  v  =  0,  we  obtain  for  u  <  t  that 


P(x(0  >  «,  x(0  >  v)  =  e  a^t+v^-\-a  f  e  a(t  x+v^  dx 

Jo 

=  e~a(u+v)  =  P(y  (0  >  m)P(x  (t)  >  v); 

P(y(t)  =  t,  x(t)  >v)=  P(ri  >  t  +  v)  =  e~a(t+v)  =  P(y(0  =  r)P(x(0  >  v); 

P{y(t)  >  t)  =  0. 

These  relations  also  imply  that  the  random  variable  r ^  =  y(t)  +  x(0  has  the 
same  distribution  as  min(t,  ri)  +  T2,  where  xj  ^  Ta,  j  =  1,  2,  are  independent  so 
that  TVj(f)  ^  ra>2  as  t  ->  00. 

The  fact  that  y(t)  and  x(0  are  independent  of  each  other  deserves  attention 
from  the  point  of  view  of  its  interpretation.  It  means  the  following.  The  residual 
lifetime  of  the  component  operating  at  a  given  time  t  has  the  same  distribution  as 
the  lifetime  of  a  new  component  (recall  that  Xj  €=  Ta)  and  is  independent  of  how  long 
this  component  has  already  been  working  (which  at  first  glance  is  a  paradox).  Since 
the  lifetime  distributions  of  devices  consisting  of  large  numbers  of  reliable  elements 
are  close  to  the  exponential  law  (see  Theorem  20.3.2),  the  above-mentioned  fact  is 
of  significant  practical  interest. 

If  xi  can  assume  negative  values  as  well,  the  problems  related  to  the  distributions 
of  y(t)  and  x(0  become  much  more  complicated.  To  some  extent  such  problems 
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can  be  reduced  to  the  case  of  nonnegative  variables,  since  the  distribution  of  x(0 
coincides  with  that  of  the  variable  x*(0  constructed  from  a  sequence  {r*  >  0}, 
where  r  *  have  the  same  distribution  as  x  (0)  •  The  distribution  of  x  (0)  can  be  found 
using  the  methods  of  Chap.  12. 

In  particular,  for  random  variables  x\,  X2, . . .  taking  values  of  both  signs ,  Theo¬ 
rems  10.4.1  and  10.4.3  imply  the  following  assertion. 


Corollary  10.4.1  Let  ri ,  T2 , be  non- arithmetic  independent  and  identically  dis¬ 
tributed  and  Eri  =  a.  Then  the  following  limit  exists 

1  C°° 

lim  P (xO)>f)  =  — —  /  P(x(0)  >  t)dt,  v  >  0. 

t^oc  E x(0)  Jv 


For  arithmetic  x 


J ’ 


1 


lim  P(xW  =  i)  = 

^oo  VAV  ;  Ex(0) 


P(x(0)>/),  i  >  0. 


10.5  The  Law  of  Large  Numbers  and  the  Central  Limit 
Theorem  for  Renewal  Processes 

In  this  section  we  return  to  the  general  case  where  xj  are  not  necessarily  identically 
distributed  (cf.  Sect.  10.1). 


10.5.1  The  Law  of  Large  Numbers 

First  assume  that  xj  >  0  and  put 

n 

ak  :=  Er*,  An  :=  )  [ at, ■ 

k= l 


Theorem  10.5.1  Let  Xf  >  0  be  independent ,  Xf 
n~l  An  a  >  0  as  n  —>  oo.  Then,  as  t  —>  oo, 

hit)  \ 

t  a 


af  uniformly  integrable ,  and 


Proof  The  basic  relation  we  shall  use  is  the  equality 

{rj(t)  >n}  =  {Tn  <  t }, 


which  implies 


n(t)  i 


- > 


£  =  P  rj(t)  >  -(+£)  =  P (Tn  <  t ), 


(10.5.1) 
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where  for  simplicity  we  assume  that  n  =  £(1  +  s)  is  an  integer.  Further 


.  -n  a 

P(r„<o  =  PI  —  < 


n  1  +  £ 

_ pi  Tn  An  ^  a 


An  \  p  f  Tn  An  ^  CIS 


n 


n 


n 


1  +  £ 

for  n  large  enough  and  s  small  enough.  Applying  the  law  of  large  numbers  to  the 
right-hand  side  of  this  relation  (Theorem  8.3.3),  we  obtain  that,  for  any  s  >  0,  as 
t  ->  oo. 


pi  mn  1 .  £ 

t  a  a 

V(t)  1  ^  e 


0. 


The  probability  P(-^  —  -  <  —  -  )  can  be  bounded  in  the  same  way.  The  theorem 

L  d  & 

is  proved.  □ 


10.5.2  The  Central  Limit  Theorem 


Put 


al  :=  E(r \  ~  ak )2  =Yarrk, 


n 


■=  E 


_2 
°k  ' 


k=  1 


Theorem  10.5.2  Let  Tk  >  0  and  the  random  variables  Xk  —  ak  satisfy  the  Lindeberg 
condition:  for  any  8  >  0  and  n  —>  oo, 


n 


EE(It*  ~ak\2\  \n  -ak I  >  SBn)  =o(Bl ) 


k=\ 

Let ,  moreover ;  f/zere  a  >  0  rmd  a  >  0  such  that ,  as  n  ^  oo, 

n 

An  :=  =  an  -\-  o(y/n),  B2  =  cr2n  +  o(n). 


k= 1 


Then 


r](t)  —  t  /a 


a 


yftja' 


0,1 


Proof  From  (10.5.1)  we  have 


P(»?(0  >  «)  =  P(r„  <  0  =  p^ 


Tn  An  t  A 


B 


< 


n 


n 


B 


n 


Let  /i  vary  as  f  ->  oo  so  that 


t  -  A 


n 


B 


n 


(10.5.2) 


(10.5.3) 


(10.5.4) 


v 
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for  a  fixed  v.  To  find  such  an  n ,  solve  for  n  the  equation 

t  —  an 


ojd 


=  v. 


This  is  a  quadratic  equation  in  n ,  and  its  solution  has  the  form 

t  vo  , — (  (  1 

n  =  -  ±  — y  y/at  (  l  +  OI  — 

a  a 1  \  V  vr 

For  such  /i,  by  (10.5.2), 


(10.5.5) 


t  -  A 


n 


Bn 


VO  / —  r 

+ —  Vat  +o(VO 

a 


(1  +o(l» 
o  \J  t  j  a 


=  +n  +  (9(1), 


This  equality  means  that  we  have  to  choose  the  minus  sign  in  (10.5.5).  Therefore, 
by  (10.5.4)  and  the  central  limit  theorem, 

ij(t)  —  t  /a 


V[r](t)  >  n)  =  P| 


>  —  v  +  o(l)  I  — >  0(n)  =  1  —  &(— v). 


OyJTjcP 

Changing  —  v  to  w,  by  the  continuity  theorems  (see  Lemma  6.2.2)  we  get 

ij(t)  —  t /a 


oVta  3 


<  u 


<P(u) 


The  theorem  is  proved. 


□ 


Remark  10.5.1  In  Theorems  10.5.1  and  10.5.2  we  considered  the  case  where  An 
grows  asymptotically  linearly  as  n  — >  oo.  Then  the  centring  parameter  t /a  for  ij(t) 
changes  asymptotically  linearly  as  well.  However,  nothing  prevents  us  from  consid¬ 
ering  a  more  general  case  where,  say,  An  ~  cna ,  a  >  0.  Then  the  centring  parameter 
for  r](t)  will  be  the  solution  to  the  equation  cna  =  t,  i.e.  the  function  (t/c)l^a  (under 
the  conditions  of  Theorem  10.5.2,  in  this  case  we  have  to  assume  that  Bn  =  o(An)). 
The  asymptotics  of  the  renewal  function  will  have  the  same  form. 

In  order  to  extend  the  assertions  of  Theorems  10.5.1  and  10.5.2  to  zj  assuming 
values  of  both  signs,  we  need  some  auxiliary  assertions  that  are  also  of  independent 
interest. 


10.5.3  A  Theorem  on  the  Finiteness  of  the  Infimum  of  the 
Cumulative  Sums 

In  this  subsection  we  will  consider  identically  distributed  independent  random  vari¬ 
ables  z\ ,  T2, _ We  first  state  the  following  simple  assertion  in  the  form  of  a  lemma. 

Lemma  10.5.1  One  has  E|r  |  <  oo  if  and  only  if 

oo 

X+m  >  j)  <  oo. 

j= i 
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The  Proof  follows  in  an  obvious  way  from  the  equality 


and  the  inequalities 


r  \  >  x 


oo 


Ep(i 

7  =  1 


r  > 


Let,  as  before, 


Tn  —  ^  Tj  . 
7  =  1 


□ 


Theorem  10.5.3  If  xj  =  r  are  identically  distributed  and  independent  and  Er  >  0, 
then  the  random  variable  Z  :=  inf&>o  T&  is  proper  {finite  with  probability  1). 

Proof  Let  rj\  =  77(1)  be  the  number  of  the  first  sum  Tk  to  exceed  level  1.  Consider 
the  sequence  {r^  =  zm+k)  that,  by  Lemma  10.2.1,  has  the  same  distribution  as  {r^} 
and  is  independent  of  rj\,  r\, . . . ,  rm .  For  this  sequence,  denote  by  rj 2  the  subscript 

k  for  which  the  sum  Tf  =  Y^!j= 1  rJ  first  exceeds  level  1.  It  is  clear  that  the  random 
variables  77 1  and  772  are  identically  distributed  and  independent.  Next,  construct  for 
the  sequence  {r^**  =  r^1+^2+^}  the  random  variable  773  following  the  same  rule, 
and  so  on.  As  a  result  we  will  obtain  a  sequence  of  Markov  times  771, 772, . . .  that 
determine  the  times  of  “renewals”  of  the  original  sequence  {7^},  associated  with 
attaining  level  1 . 

Now  set 

Z\  :=  min  7^,  Z2  :=  min  Tj*, ... 

k<rj\  k<r)  2 

Clearly,  the  Zj  are  identically  distributed  and 

Z  =  inf{Zi,  Tril  +  Z2,  Tril+rj2  +  Z3, . . .}, 
where  by  definition  Tm  >  1,  Tr}x+m  >  2  and  so  on.  Hence 

00  00 

{Z  <  —AT}  =  U {Zk+l  +  Tm+...+m  <  -N}  C  U {Zk+k<  -AT}, 

k= 0  k= 0 

00  00 

P(Z<-N)<J2P(Zk+k<-N)=  P(Zi  <  —j). 

k=  1  j=N+ 1 

This  expression  tends  to  0  as  N  — >  00  provided  that  E|Zi  |  <00  (see  Lemma  10.5.1). 
It  remains  to  verify  the  finiteness  of  EZi,  which  follows  from  the  finiteness  of 
E771  =  E77  ( 1)  =  7/(1)  <  c  (see  Example  4.4.5)  and  the  relations 

m 

E|Zi |  <E^|t;-|  =E??iE|ri|  <  OO 
7  =  1 


(see  Theorem  4.4.2). 


□ 


302 


10  Renewal  Processes 


10.5.4  Stochastic  Inequalities.  The  Law  of  Large  Numbers  and  the 
Central  Limit  Theorem  for  the  Maximum  of  Sums  of 
N on-identically  Distributed  Random  Variables  Taking 
Values  of  Both  Signs 


In  this  subsection  we  extend  the  assertions  of  some  theorems  of  Chap.  8  to  maxima 
of  sums  of  random  variables  with  a  positive  “mean  drift”.  To  do  this  we  will  have  to 
introduce  some  additions  restrictions  that  are  always  satisfied  when  the  summands 
are  identically  distributed.  Here  we  will  need  the  notion  of  stochastic  inequalities 
(or  inequalities  in  distribution).  Let  §  and  f  be  given  random  variables. 


Definition  10.5.1  We  will  say  that  f  majorises  (minorises)  §  in  distributionand  de¬ 
note  this  by  §  <  f  (§  >  f )  if,  for  all  t, 

P($  >  0  <  P(?  >  0  (P($  >  0  >  P(?  >  0)  • 

d  d 

Clearly,  if  §  <  £  then  — §  >  —  f .  We  show  that  stochastic  inequalities  possess 
some  other  properties  of  ordinary  inequalities. 


Lemma  10.5.2  If  and  {£  }^=1  are  sequences  of  independent  {in  each  se- 

d 

quence)  random  variables  and  ^  then,  for  all  n. 


d  —  d  — 

$n  —  Zn ,  5^  £  Zn-> 


where 


n  n 

Sn  —  ^  '  &  ?  -Z/7  —  ^  '  C/c  ?  ^/?  —  max  S* , 

k= 1  £=1 

J  d 

Similarly,  if%k  >  t/zen  min^  5^  >  min)t<„  Z*. 


—  max  Z*. 

k<n 


Proof  Let  F*(0  :=  P(§*  <  t)  and  Gk(t)  :=  P(&  <  t).  Using  quantile  transforma¬ 
tions  F^~^  and  (see  Definition  3.2.4)  and  a  sequence  of  independent  random 
variables  {cr>k}fLv  cok  ^  Uo,i,  we  can  construct  on  a  common  probability  space  the 

sequences  =  Fpl\a>k)  and  ^  =  G[_1)(<W/i)  such  that  =  %k  and  ^  =  i;k  (the 
distributions  of  ^  and  ^  and  of  and  ^  coincide).  Moreover,  ^  ,  which  is 

a  direct  consequence  of  the  inequality  Ff(t)  >  Gf(t )  for  all  t.  Endowing  with  the 
superscript  *  all  the  notations  for  sums  and  maximum  of  sums  of  random  variables 
with  asterisks,  we  obviously  obtain  that 

V  ^  ^  y  o  ^  ^  7^  ^  7 

=  —  ^ri  ~  =  ^  n  —  ^  n  =  • 

The  last  assertion  of  the  lemma  follows  from  the  previous  ones.  The  lemma  is 
proved.  □ 


Below  we  will  need  the  following  corollary  of  Theorem  10.5.3. 
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d 

Lemma  10.5.3  Let  ^  be  independent ,  >  £  /or  a//  k  and  E£  >  0.  Then,  for  all  n, 

the  random  variable 


Dn  :=  >  0 

d 

is  majorised  in  distribution  by  the  random  variable  —  Z:  Dn  <  —  Z,  where  Z  := 
inf  Z&,  Z^  :=  0  0  Independent  copies  oft; . 


Proof  We  have 

—  max(0,  iSi , . . . ,  Sf)  =  Sn  - 1-  max(0,  ^ n ,  i, . . . ,  5^) 

=  min(0,  ^ n ,  T  ^n— l?  •  •  •  >  5)j), 

where,  by  the  last  assertion  of  Lemma  10.5.2, 

—  min(0,  ^ni  T  ^n— 1 5  •  ■  •  5  >  mm  Z^  >  Z ,  Dn  <  Z. 

k<n 

The  fact  that  Z  is  a  proper  random  variable  follows  from  Theorem  10.5.3  on  the 
finiteness  of  the  infimum  of  partial  sums.  The  lemma  is  proved.  □ 

If  §&  =  §  are  identically  distributed  and  a  =  E§  >0,  then  we  can  put  §  =  f .  The 
above  reasoning  shows  that  in  this  case  the  limit  distribution  of  Sn  —  Sn  as  n  ->  oo 
exists  and  coincides  with  the  distribution  of  the  random  variable  Z  (the  random 
variables  Sn  —  Sn  themselves  do  not  have  a  limit,  and,  by  the  way,  neither  do  the 

n 

variables  n^n  in  the  central  limit  theorem). 

d  - 

Lemma  10.5.3  shows  that,  for  >  f  and  Ef  >  0,  the  random  variables  Sn  and 

Sn  differ  from  each  other  by  a  proper  random  variable  only.  This  makes  the  limit 
theorems  for  Sn  and  Sn  essentially  the  same. 

We  proceed  to  the  law  of  large  numbers  and  the  central  limit  theorem  for  Sn . 

Theorem  10.5.4  Let  ak  =  E§£  >  0,  An  =  Yl=\  a^  and  An  ~  an  as  n  — >  oo,  a  >  0. 

d 

Let ,  moreover ;  ^  uniformly  integrable  for  all  k  and  ^  >  f  with  Ef  >  0. 

as  n  ^  oo, 

^  p 
- >  a. 

n 

Note  that  the  fe/f  uniform  integrability  of  ^  ^  follows  from  the  inequalities 

d 


Proof  By  Lemma  10.5.3, 


_  d 

=  S*  +  Dn ,  where  Dn  >  0,  Dn  <  -Z. 


(10.5.6) 
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Therefore, 


' n  Sn  An  An  Dn 


n  n  n  n 

where  by  Theorem  8.3.3,  as  n  — >►  oo, 

Sn  An  p 


■>0. 


n 


It  is  also  clear  that 


A 


D, 


n  P  f\ 

- >  a, - >  0. 

n  n 


The  theorem  is  proved. 


□ 


In  addition  to  the  notation  from  Theorem  10.5.3,  put 


n 


of  :=  E(fjt  —  cik)2, 


Bn  ■=  E 


Oi 


k= 1 


Theorem  10.5.5  Let,  for  some  a  >  0  and  o  >  0, 

An  =  an  +  o(y/n),  B%  =  cr2n  +  o(n), 


d 


and  let  the  random  variables  ^  —  ak  satisfy  the  Lindeberg  condition ,  §&  >  £  with 
E£  >  0.  Then 


Sn  —  an 


(Jx/n 


$0,1 


(10.5.7) 


Proof  By  virtue  of  (10.5.6), 

Sn  an  Sn  An  Bn 


a  yfn 


B 


+ 


An 


an  D 

+ 


n 


n 


a  yfn  o  yfn  o  yfn' 


(10.5.8) 


where,  by  the  central  limit  theorem, 

Sn  An 


B 


0,1 


n 


Moreover, 


B 


n  P 


(Xy/n 


>  1, 


An  —  an 


o  yfn 


0, 


Dn 

O  y/n 


p 


>0. 


This  and  (10.5.8)  imply  (10.5.7).  The  theorem  is  proved. 


□ 


10.5.5  Extension  of  Theorems  10.5.1  and  10.5.2  to  Random 
Variables  Assuming  Values  of  Both  Signs 

We  return  to  renewal  processes  and  limit  theorems  for  them.  In  Theorems  10.5.1 
and  10.5.2  we  obtained  the  law  of  large  numbers  and  the  central  limit  theorem  for 
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the  renewal  process  rj(t)  defined  in  (10.1.1)  with  jumps  Xk  >  0.  Now  we  drop  the 
last  assumption  and  assume  that  xj  can  take  values  of  both  signs. 


Theorem  10.5.6  Let  the  conditions  of  Theorem  10.5.1  be  met ,  the  condition  Xk  >  0 

d 

being  replaced  with  the  condition  Xk  >  £  with  Ef  >  0.  Then 


*7(0 


p 


1 

>  — 
a 


(10.5.9) 


d 


If  Xk  =  x  are  identically  distributed  and  Er  >  0,  then  we  can  put  f  =  x.  There¬ 
fore  Theorem  10.5.6  implies  the  following  result. 


Corollary  10.5.1  If  Xk  are  independent  and  identically  distributed  and  Er  =  a  >  0, 
(10.5.9)  holds  true. 


Proof  of  Theorem  10.5.6  Here  instead  of  (10.5.1)  we  should  use  the  relation 

k 

Tk,  Tk  =  YJTr  (10.5.10) 

7  =  1 

Then  we  repeat  the  argument  from  the  proof  of  Theorem  10.5.1,  changing  in  it  Tn 
to  Tn  and  using  Theorem  10.5.4,  which  implies  that  Tn  and  Tn  satisfy  the  law  of 
large  numbers.  The  theorem  is  proved.  □ 

Theorem  10.5.7  Let  the  conditions  of  Theorem  10.5.2  be  met ,  the  condition  Xk  >  0 

d 

being  replaced  with  the  condition  Xk  >  f  with  E£  >  0.  Then  (10.5.3)  holds  true. 


Proof  Here  we  again  have  to  use  (10.5.10),  instead  of  (10.5.1),  and  then  repeat  the 
argument  proving  Theorem  10.5.2  using  Theorem  10.5.5,  which  implies  that  the 

distribution  of  11  ~fn ,  as  well  as  the  distribution  of  Ln~al1  converges  to  the  standard 

o  -y/n  a  «Jn  ° 

normal  law  Oq,  i  .  The  theorem  is  proved.  □ 


Remark  10.5.2  (An  analogue  of  Remarks  8.3.3,  8.4.1  and  10.1.1)  The  assertions  of 
Theorems  10.5.6  and  10.5.7  can  be  generalised  as  follows.  Let  x\  be  an  arbitrary 
random  variable  and  random  variables  x £  :=  H+yb  k  >  1,  satisfy  the  conditions 
of  Theorem  10.5.6  (Theorem  10.5.7).  Then  convergence  (10.5.9)  (10.5.3)  still  takes 
place. 

Consider,  for  example,  Theorem  10.5.7.  Denote  by  Ax  the  event 


ij(t)  —  a/t 
G\ftjc? 


<  X  >. 


Then  the  foregoing  assertion  follows  from  the  relations 


P(AX)  =  E[P(Ajc|ri);  |ti|  <  N]  + 
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where  r^y  <  P(lnl  >  N)  can  be  made  arbitrarily  small  by  the  choice  of  N,  and  by 
Theorem  10.5.7 


P(A,|n)  =  P 


ri*(t~r  i)-^ 
Cfy/(t  -  z\  )/a3 


+ o 


*(X) 


as  t  — >  oo  for  each  fixed  z\,  \z\\  <  N.  Here  is  the  renewal  process  that  corre¬ 
sponds  to  the  sequence  {r^}.  □ 


10.5.6  The  Local  Limit  Theorem 


d 


If  we  again  narrow  our  assumptions  and  return  to  identically  distributed  Zk  =  r  >  0 
then  we  can  derive  local  theorems  more  precise  than  Theorem  10.5.2.  In  this  sub¬ 
section  we  will  find  an  asymptotic  representation  for  P (r](t)  =  n)  as  t  — >►  oo.  We 
know  from  Theorem  10.5.2  what  range  of  values  of  n  the  bulk  of  the  distribution 
of  r](t)  is  concentrated  in.  Therefore  we  will  from  the  start  consider  not  arbitrary  n , 
but  the  values  of  n  that  can  be  represented  as 


n  = 


— |-  vcr 
a 


a- 


a2  =  Var(r), 


(10.5.11) 


for  “proper”  values  of  v  ([s]  in  (10.5.11)  is  the  integer  part  of  s),  so  that 

(t  —  an)  (  1 

-  =  y  +  O 


o  jn 


St. 


(10.5.12) 


(see  (10.5.5)).  For  the  proof,  it  will  be  more  convenient  to  consider  the  probabilities 
P (ri(t)  =  n  +  1).  Changing  n  +  1  to  n  amends  nothing  in  the  argument  below. 


Theorem  10.5.8  If  z  >  0  is  either  non-lattice  or  arithmetic  and  Var(r) 
then,  for  the  values  ofn  defined  in  (10.5.11),  as  t  ^  oo, 

a2!1  9 

P (r](t)  =  n  +  l)  ~  — - - e~v  /2, 

o\j2nt 

where  in  the  arithmetic  case  t  is  assumed  to  be  integer. 


—  o<  oo, 


(10.5.13) 


Proof  First  let,  for  simplicity,  r  have  a  density  and  satisfy  the  conditions  of  the  local 
limit  Theorem  8.7.2.  Then 

P(r](t)  =  n  +  l)  =  f  P(Tn  e  du)P(z  >  t  —  u),  (10.5.14) 

JO 


where  by  Theorem  8.7.2,  as  n  — >  oo, 


P (Tn  —  na  e  d(u  —  naf)  = 


du 


V2 


as  Ann 


exp 


( u  —  na)J 
2  na2 


+  o(l) 
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uniformly  in  u.  Change  the  variable  u  =  t  —  z.  Since  for  the  values  of  n  we  are 
dealing  with  one  has  (10.5.12),  the  exponential 


exp< 


(u  —  nay 
2n  a  2 


—  exp  < 


remains  “almost  constant”  and  asymptotically  equivalent  to  e~v  /2  for  |z|  <  N, 
N  ^  oo,  N  =  o(*Jn).  Hence  the  integral  in  (10.5.14)  is  asymptotically  equivalent 
to 

— - e~v 2//2  f  P(r  >  z)dz^  — j - evl ^ . 

o\/2ixn  Jo  o\/2ixn 

Since  n  ~  t la  as  t  — >  oo,  we  obtain  (10.5.13). 

If  r  has  no  density,  but  is  non-lattice,  then  we  should  use  the  integro-local  Theo¬ 
rem  8.7.1  for  small  A  and,  in  a  quite  similar  fashion,  bound  the  integral  in  (10.5.14) 
(with  t,  which  is  a  multiple  of  A)  from  above  and  from  below  by  the  sums 


and 


t/A- 1 

P (Tn  e  A[kA))P(z  >t-(k  +  1)4) 

k=0 


t/A- 1 

Y/  P (Tn  e  A[kA)) P(r  >  t  -  kA ), 
k= 0 

respectively.  For  small  A  both  bounds  will  be  close  to  the  right-hand  side 
of  (10.5.13). 

If  r  has  an  arithmetic  distribution  then  we  have  to  replace  integral  (10.5.14)  with 
the  corresponding  sum  and,  for  integer  u  and  t,  make  use  of  Theorem  8.7.3. 

The  theorem  is  proved.  □ 


If  examine  the  arguments  in  the  proof  concerning  the  behaviour  of  the  correction 
term,  then,  in  addition  to  (10.5.13),  we  can  also  obtain  the  representation 

p'"(')=")  =  ^r‘i'",,2+<’(77)  <105'15) 

uniformly  in  v  (or  in  n). 


10.6  Generalised  Renewal  Processes 
10.6.1  Definition  and  Some  Properties 

Let,  instead  of  the  sequence  ,  there  be  given  a  sequence  of  two-dimensional 

independent  vectors  (r j,  §7),  xj  >  0,  having  the  same  distribution  as  (r,  §).  Let,  as 
before, 
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k  k 

Sk= £7 ,  Tk = t/  ,  50=7o=o, 

j=i  j=i 

rj{t)  —  min{k  :  7^  >  7} ,  v(7)  =  max{k  :  Tk  <  t)  =  rj(t)  —  1. 

Definition  10.6.1  The  process 

S(v)(0  =  qt  T  SV(o 

is  called  a  generalised  renewal  process  with  linear  drift  q . 

The  process  S(v)(t ),  as  well  as  v(t),  is  right-continuous.  Clearly,  S(V)(f)  =  gt  for 
t  <  r  1 .  At  time  t  =  r  \  the  first  jump  in  the  process  5(V)  (0  occurs,  which  is  of  size  : 

S(y)(jl  ~  0)  —  ^(v)(rl)  —  #^1  +  §1- 

After  that,  on  the  interval  [7j ,  Tf)  the  value  of  S(v)(t)  varies  linearly  with  slope  q. 
At  the  point  T2,  the  second  jump  occurs,  which  is  of  size  §2,  and  so  on. 

Generalised  renewal  processes  are  evidently  a  generalisation  of  random  walks  Sr 
(for  Tj  =  1,  q  =  0)  and  renewal  processes  q(t)  =  v(t)  +  1  (for  =  1,  q  =  0).  They 
are  widespread  in  applications,  as  mathematical  models  of  various  physical  systems. 

Along  with  the  process  S(v)(t),  we  will  consider  generalised  renewal  processes 
of  the  form 


S(t)  —  qt  +  Sri(t)  —  S(V)(0  +  £77(0  > 

that  are  in  a  certain  sense  more  convenient  to  analyse  since  q{t)  is  a  Markov  time 
with  respect  to  =  o , ... ,  xn\  §1, . . . ,  §w)  and  has  already  been  well  studied. 

The  fact  that  the  asymptotic  properties  of  the  processes  S(t )  and  S(V)(t ),  as 
t  ->  00,  (the  law  of  large  numbers,  the  central  limit  theorem)  are  identical  follows 
from  the  next  assertion,  which  shows  that  the  difference  S(t )  —  5(V)(0  has  a  proper 
limiting  distribution. 


Lemma  10.6.1  If  Er  <  00,  then  the  following  limiting  distribution  exists 

E(r;  $  <  v) 
lira  P(^(f)  <  v)  = -  - . 

t^o o  /v  Er 


P 


The  lemma  implies  that  /b(t)  — >  0  for  any  function  b(t)  ^  00  as  t  ^  00. 


Proof  By  virtue  of  the  key  renewal  theorem, 

00 


P (£),«)  <  v)  = 


w  ft 

si 


P(7^  e  (iw)P(r  >  t  —  u,^  <  v) 


k= 0 


-f 


=  /  dH(t)P(r  >  t  —  u,  ^  <  v) 

fo 

E(r;  §  <  v) 


1  f00 

—  /  P(r  >  w,  §  <v)du 

Er  y0 


Er 


The  lemma  is  proved. 


□ 
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As  was  already  noted,  rj(t)  is  a  stopping  time  with  respect  to 

n  =  ®  {Z\  ,  .  .  .  ,  Zn ,  ,  ••■>£«)■ 

Therefore,  if  (zj,  ^j)  are  identically  distributed,  then  by  the  Wald  identity  (see  The¬ 
orem  4.4.2  and  Example  4.4.5) 

E5(0  =  qt  +  atFjT](t)  ~  qt  H — —  (10.6.1) 

a 

as  t  ^  oo,  where  =  E§  and  a  =  Er.  The  second  moments  of  5(0  will  be  found 
in  Sect.  15.2.  The  laws  of  large  numbers  for  S(t )  will  be  established  in  Sect.  11.5. 


10.6.2  The  Central  Limit  Theorem 


In  order  to  simplify  the  exposition,  we  first  assume  that  the  components  zj  and  of 

the  vectors  ( Zj ,  %j)  =  (r,  §)  are  independent.  Moreover,  without  losing  generality, 
we  assume  that  q  —  0. 


Theorem  10.6.1  Let  there  exist  a2  =  Varr  <  oo,  a2  =  Var(§)  <  oo  with  a  + 
a |  >  0.  If  the  coordinates  z  and  §  are  independent  then ,  as  t  ^  oo, 


^(t)  —  rt 

asVt 


^  $0,1, 


where  r  =  a^/a  and  crj  =  a  1  (cr2  +  r2cr2)  =  a  1  Var(§ 
/zo/<A  true  for  5(V)(0  as  well. 


rz).  The  same  assertion 


Proof  If  one  of  the  values  of  o  and  a |  is  zero,  then  the  assertion  of  the  theorem 
follows  from  Theorems  8.2.1  and  10.5.2.  Therefore  we  can  assume  that  o  >  0  and 
o*z  >  0.  Denote  by  0  =  cr(z\,  Z2, . . .)  the  a -algebra  generated  by  the  sequence  {zj} 
and  by  C  0  the  set 


At  =  {|i?(0  -t/a 


<  r1/2+£}, 


ee  (0,1/2). 


Since  by  the  central  limit  theorem  P(A?)  ->  1  as  t  ->  oo,  for  any  trajectory  qf) 
in  At  we  have  q{t)  — >►  oo  as  t  — >  oo,  and  the  random  variables 

5(t)  —  atn{t) 

Z(t)  =  JLl 

are  asymptotically  normal  with  parameters  (0,  1)  by  the  independence  of  {§;} 
and  { r j } .  In  other  words,  on  the  sets  At, 

E(elXZ^\&)  — >  ^2  as  t  ->  oo. 


Since 


(2 


^3/2 


and  rj(t)~- 
a 
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on  the  sets  Ar  e  0,  we  also  have  on  the  sets  At  the  relation 


E(  exp 


iMS(r)-rr-SW*?f) 


O, 


tVU 


a 


0 


,-X2/2 


Since  the  random  variables  £>  and  rj(t)  are  measurable  with  respect  to  0,  the  corre¬ 
sponding  factor  can  be  taken  outside  of  the  conditional  expectation,  so  that 


E(  exp 


iX(S(t)  —  rt) 
o^+Jt /a 


0 


exp 


A2  iXra 
H - Kt 

2 


Hence 


Eexp 


/A(S(f)  —  rt) 


<7; 


a 


=  6>(1)  +  E(  exp 


=  0(1)  +  exp 


A2  /Act 

H  & 

2  CT£ 


;  A; 


X2 

/  ra  \2~ 

— 

1  +  — 

2 

L  \^J  J 

This  means  that 


ta% 


* 


^0,0-?’ 


where 


cr 


2  r- 


as  = 


a 


1  + 


rcr 


aH 


2-, 


— lr  2  i  2  21 
=  a  [cft+r  ( j  J 


The  assertion  corresponding  to  S(v)(f)  follows  from  Lemma  10.6.1.  The  theorem 
is  proved.  □ 


Note  that  Theorems  8.2.1  and  10.5.2  are  special  cases  of  Theorem  10.6.1.  If 
a ^  =  0,  then  S(t)  is  distributed  identically  to  S[t/a\  and  is  independent  of  cr. 

Now  consider  the  general  case  where  r  and  §  are,  generally  speaking,  dependent. 
Since  T^)  =  t  +  x(0>  we  have  the  representation 

S(t)  —  rt  =  Zm  +rx(t),  (10.6.2) 


where 


n 


Zn  —  0’  0  —  rT/ ’  ^0  — 


J  =  1 


x(0 


as  t  ^  oo  (x(0  has  a  proper  limiting  distribution  as  t  — >  oo).  Moreover,  we  will 
use  yet  another  Wald  identity 


EZ2(/)  =(i2E?7(0,  d2  =  E£2,  £=§-rr,  (10.6.3) 


that  is  derived  below  in  Sect.  15.2. 
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Theorem  10.6.2  Let  (r j,%j)  =  (r,  §)  independent  identically  distributed  and 
such  that  a 2  =  Var(r)  <  oo  and  a 2  =  Var(§)  <  oo  exist.  Then 


S(t)  —  rt 

*sV~t 


^  $0,1, 


where  r  =  a%/ a  and  cr|  =  a  ld2.  The  random  variables 
the  same  limiting  distribution. 


S(y)(t)—rt 

GSVt 


and  —^7= 

Gs^/t 


have 


Proof  It  is  seen  from  (10.6.2)  that  it  suffices  to  prove  that 

Zn(t) 


crsVi 


0,1 


The  main  contribution  to  Z^)  comes  from  Zm  with  m  =  [t-  -  2 Ny/t],  N  oo, 
N  =  o(sft  ),  where 


jm 


ma 


\fa  Zm 

dy/i  d+Jm  V  t 

The  remainder  Z^)  —  Zm,  for  each  fixed 


$0,1 


Tm  G  /yv  :=  [t  —  3aN\ft,  t  —  aN\/t],  P (Tm  G  /yv)  — >  1, 
has  the  same  distribution  as  Z^^_ Tm),  and  its  variance  (see  (10.6.3))  is  equal  to 


d2YLr](t  —  Tm)  ~  J 


2  - —  <  3d2N \ft  =  o(0- 

a 


Since  E Z^_7m)  =  0,  we  have 


Z  r)  f  f  — 


r)(t—Tm)  P 


>  o 


as  t  —>  oo.  The  theorem  is  proved. 


(10.6.4) 

□ 


Note  that,  for  N  — >  oo  slowly  enough,  relation  (10.6.4)  can  be  derived  using 
not  (10.6.3),  but  the  law  of  large  numbers  for  generalised  renewal  processes  that 
was  obtained  in  Sect.  11.5. 

Theorem  10.6.1  could  be  proved  in  a  somewhat  different  way — with  the  help 
of  the  local  Theorem  10.5.3.  We  will  illustrate  this  approach  by  the  proof  of  the 
integro-local  theorem  for  S(t) . 


10.6.3  The  Integro-Local  Theorem 

In  this  section  we  will  obtain  the  integro-local  theorem  for  S(t)  in  the  case  of  non¬ 
lattice  § .  In  a  quite  similar  way  we  can  obtain  local  theorems  for  densities  (if  they 
exist)  and  for  the  probability  P (S(t)  =  k)  for  q  =  0  for  arithmetic 
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Theorem  10.6.3  Let  the  conditions  of  Theorem  10.6.1  hold  and ,  moreover ;  §  he 
non-lattice.  Then,  for  any  fixed  A  >  0,  as  t  —>  oo, 


P(S(t)  -rte  A[x))  = - -<p( 


X 


vsV~t  \crsVt , 

where  the  remainder  term  o(\/ ^ft)  is  uniform  in  x. 


+  o 


1 


Stj' 


(10.6.5) 


Proof  Since  §  is  non-lattice,  one  has  >  0.  If  a  =  0  then  the  assertion  of  the 
theorem  follows  from  Theorem  8.7.1.  Therefore  we  will  assume  that  a  >  0.  By  the 
independence  of  {%j}  and  {rj}. 


oo 


P(S(t)  -rt  e  A[x))  =  y^P (rjjt)  =  n)P(Sn  -rt  e  A[x ))  =  T,  +  T.  > 

n — 1  neMt  n£Mt 

where  Mt  =  {n  :  \n  —  t/a\  <  tl/2N(t)},  N(t )  — >►  oo,  N(t)  =  o(^ft)  as  t  — >►  oo.  We 
know  the  asymptotics  of  both  factors  of  the  terms  in  the  sum  from  Theorems  8.7.1 
and  10.5.8  (see  also  (10.5.15)).  It  remains  to  do  the  summation,  which  is  unfortu¬ 
nately  somewhat  cumbersome.  At  the  same  time,  it  presents  no  substantial  difficul¬ 
ties,  so  we  will  sketch  this  part  of  the  proof.  If  we  put  an  —  t  =:  u. 


Pi(t):= 


A 


<r$y/ 2 


exp 


Tin 


(. x  —  ru)‘ 
2  no} 


a V2 

Pi(t)  :=  — - —  exp 


a 


\f7jrt 


U‘ 


2  o2n 


Furthermore, 


P(.s„  -  rt  e  A[x))  =  P\{t)  +  o(2=Y 

P{n  (t)=n)  =  P2(t)  +  o(-L  J 

for  n  e  Mt  and  N(t)  — >►  oo  slowly  enough  as  t  — >  oo.  Clearly, 

1 


E  = 


6> 


n£Mt 


S. 


Since  the  sums  of  Pi  (t)  and  7^(0  are  bounded  in  n  by  a  constant,  we  have 


E=°f4)  +  E/>i«/>2(o. 


neMt  v 

The  exponent  in  the  product  Pi(0^2(0>  taken  with  the  negative  sign,  is  equal  to 


1 


2  n 


(x  —  ruY  u 
- ~ - +  - 


2  i 


cn 


<J‘ 


a 
2 ~t 


(d2u  —  rxa2)2  x 

-  + 


2i 


d2a2o } 


d2 


where  d2  —  r2o2  +  a}.  Since,  for  v  =  o(^ft  N(t)), 


E 

neAt 


a 


y2d 


\Jlntoo^ 


exp 


2\2 


a(d  u  —  rxofi 
2  td2a2a} 


1 
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as  t  — >►  oo  and  this  sum  does  not  exceed  1  +  o(  1)  for  all  v  (this  is  an  integral  sum 
that  corresponds  to  the  integral  of  the  density  of  the  normal  law),  it  is  easy  to  de¬ 
rive  ( 1 0 . 6 . 5 )  from  the  foregoing .  □ 

We  will  continue  the  study  of  generalised  renewal  processes  in  Sect.  11.5. 


Chapter  11 

Properties  of  the  Trajectories  of  Random  Walks. 
Zero- One  Laws 


Abstract  The  chapter  begins  with  Sect.  11.1  establishing  the  Borel-Cantelli  and 
Kolmogorov  zero-one  laws,  and  also  the  zero-one  law  for  exchangeable  sequences. 
The  concepts  of  lower  and  upper  functions  are  introduced.  Section  11.2  contains 
the  first  Kolmogorov  inequality  and  several  theorems  on  convergence  of  random  se¬ 
ries.  Section  11.3  presents  Kolmogorov’s  Strong  Law  of  Large  Numbers  and  Wald’s 
identity  for  stopping  times.  Sections  1 1.4  and  1 1.5  are  devoted  to  the  Strong  Law  of 
Large  Numbers  for  independent  non-identically  distributed  random  variables,  and  to 
the  Strong  Law  of  Large  Numbers  for  generalised  renewal  processes,  respectively. 

11.1  Zero-One  Laws.  Upper  and  Lower  Functions 

Let,  as  before,  Sn  =  YTj= i  §/  be  the  sums  of  independent  random  variables 
§1,  §2,  •  •  •  •  In  this  chapter  we  will  consider  properties  of  the  “whole”  trajectories 
of  random  walks  { Sn } . 

The  first  limit  theorem  we  proved  for  the  distribution  of  the  sums  of  independent 

p 

identically  distributed  random  variables  was  the  law  of  large  numbers:  Sn/n  — >  E§. 
One  could  ask  whether  the  whole  trajectory  Sn/n,  Sn+i/(n  +  1), . . . ,  starting  from 
some  n,  will  be  close  to  E§  with  a  high  probability.  That  is,  whether,  for  any  s  >  0, 
we  will  have 


(11.1.1) 


This  is  clearly  a  problem  on  almost  sure  convergence,  or  convergence  with  probabil¬ 
ity  1 .  A  similar  question  arises  concerning  generalised  renewal  processes  discussed 
in  Sect.  10.6. 


Assertion  (11.1.1),  which  is  called  the  strong  law  of  large  numbers  and  is  to  be 
proved  in  this  chapter,  is  a  special  case  of  the  so-called  zero-one  laws.  As  the  first 


such  law,  we  will  now  present  the  Borel-Cantelli  zero-one  law. 

11.1.1  Zero-One  Laws 

Theorem  11.1.1  Let  {A„};(^1  be  a  sequence  of  events  on  a  probability  space 
P),  and  let  A  be  the  event  that  infinitely  many  events  Ak  occur. ;  i.e. 
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A  =  f|~i  U£/i  Ak  ( the  event  A  consists  of  those  co  that  belong  to  infinitely  many 

A&). 

tfYjcL i  P(A&)  <  oo,  then  P(A)  =  0.  lfY!T=  l  P(A&)  =  00  and  the  events  Ak  are 
independent ,  then  P(A)  =  1. 


Proof  Assume  that  P(A&)  <  oo.  Denote  by  rj  =  JfkLi  I  (Ak)  the  number  of 
occurrences  of  events  Ak.  Then  Erj  =  P(A^)  <  oo  which  certainly  means  that 

77  is  a  proper  random  variable:  P(rj  <  00)  =  1  —  P(A)  =  1. 

If  Ak  are  independent  and  YlfLi  P(A^)  =  00,  then,  since  Ak  =  £2  \  Ak  are  also 
independent,  we  have 


00 


00 


P (A)  =  lim  P  I  J  Ak  =  lim  Pp2  -  P|  Ak 

fl — >00  *  I  n — vnn  1  I  I 


k=n 


o 


k=n 


00 


m 


=  1  —  lim  P[  C^\  Ak  )  =  1  —  lim  lim  P  P|A[ 

fl — >00  ill  I  vi — ^  r\n  m — vnn  V  i  1 


k=n 


n — >00  m — >  oo 


k=n 


00 


=  1-  lim  n(l-P(A*)). 

n^oo  A  A  v  7 


k=n 


Using  the  inequality  ln(l  —  x)  <  —x  we  obtain  that 


ID  -  P(AjO)  <  exp- 


OO 

^P(At)  •• 

k=n 


Hence 


00 

n(l-P(A*))<e~°°  =  0,  P(A)  =  1. 

k=n 


The  theorem  is  proved. 


□ 


Remark  11.1.1  It  follows  from  Theorem  11.1.1  that,  for  independent  events  Ak, 
the  assertions  that  Eij  <  00  and  that  P (77  <  00)  =  1  are  equivalent  to  each  other. 
Although  in  one  direction  this  relationship  is  obvious,  in  the  opposite  direction  it 
is  quite  meaningful.  It  implies,  in  particular,  that  if  77  <  00  with  probability  1,  but 
E77  =  00,  then  Ak  are  necessarily  dependent. 

Note  also  that  the  argument  proving  the  first  part  of  the  theorem  has  already  been 
used  for  the  same  purpose  in  the  proof  of  Theorem  6.1.1. 

Assume  that  {^n)^L\  is  a  sequence  of  independent  random  variables  given  on 
(f2,  P).  Denote,  as  before,  by  cr(£i, . . . ,  §w)  the  a -algebra  generated  by  the  first 

n  random  variables  §1,  •  •  • ,  §«,  and  by  cr(§n, . . .)  the  a -algebra  generated  by  the 
random  variables  £*+1,  £*+2, .... 


Definition  11.1.1  An  event  A  is  said  to  be  a  tail  event  if  A  e  cr(fin, . . .)  for  any 
n  >  0. 
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For  example,  the  event 

oo  oo 

A=n{J&>N) 

n  =  1  k=n 

meaning  that  there  occurred  infinitely  many  events  {§£  >  N)  is  clearly  a  tail  event. 

Theorem  11.1.2  (Kolmogorov  zero-one  law)  If  A  is  a  tail  event ,  then  either 
P(A)  =  0orP(A)  =  1. 

Proof  Since  A  is  a  tail  event,  A  g  cr(£n+i, . . .),  n  >0.  Therefore  the  event  A  is 
independent  of  the  a -algebra  cr(§i, . . . ,  §„)  for  any  n.  Hence  (see  Theorem  3.4.3) 
the  event  A  is  independent  of  the  a -algebra  <r(£i, . . .).  Since  A  e  <t(£i,  . . .),  it  is 
independent  of  itself: 

P(A)  =  P(AA)  =  P(A)P(A). 

But  this  is  only  possible  if  P(A)  =  0  or  1 .  The  theorem  is  proved.  □ 

Put  S  =  sup{0,  Si,  S2,  •  •  •},  where  Sn  =  J2k=\  An  example  of  an  application 
of  the  above  theorem  is  given  by  the  following 

Corollary  11.1.1  If  ^ ,  k  =  1,2,...  ,  are  independent ,  then  either  P (S  =  00)  =  1 
or  P(S  <  00)  =  1. 

The  Proof  follows  from  the  fact  that  {S  =  00}  is  a  tail  event.  Indeed,  for  any  n 

{S  =  00}  =  {sup(Sn_i,  Sn, . . .)  =  00} 

—  {sup(0,  Sn  Sn~  1, . . .)  =  00 J  G  (r(^n, . . .).  I  I 

Further  examples  of  tail  events  can  be  obtained  if  we  consider,  for  a  sequence 
of  independent  variables  §1,  §2, . . . ,  the  event  {the  series  J2T  %k  is  convergent) . 
Theorem  1 1.1.2  means  that  the  probability  of  that  event  can  only  be  0  or  1. 

If  we  consider  the  power  series  where  ^  are  independent,  we  will 

see  that  the  convergence  radius  p  =  limsup/^^  \^k\~l^k  of  this  series  is  a  random 
variable  measurable  with  respect  to  the  a -algebra  <7(§n, . . .)  for  any  n  (\p  <  x)  e 
&  (Hn  ,...),  0  <  v  <  00).  Such  random  variables  are  also  called  tail  random  vari¬ 
ables.  Since  by  the  foregoing  one  has  Fp(x)  =  P (p  <  x)  =  0  or  1,  this  implies 
that  p ,  as  well  as  any  other  tail  random  variable,  must  be  equal  to  a  constant  with 
probability  1. 

Under  the  assumption  that  the  elements  of  the  sequence  {^}^j  are  not  only  in¬ 
dependent  but  also  identically  distributed,  Kolmogorov’s  zero-one  law  was  extended 
by  Hewitt  and  Savage  to  a  wider  class  of  events. 

Let  co  =  (x  1,  V2, . . .)  be  an  element  of  the  sample  space  (IR00,  33°°,  P)  for  the 
sequence  §  =  (§1,  §2,  •  •  •)  (R°°  is  a  countable  direct  product  of  the  real  lines  R^, 
k  =  1,2,...,  <B°°  =  cr(§i, . . .)  is  generated  by  the  sets  Ylk=i  Bk  G  cr(§  1, . . . ,  §#), 
where  Bk  G  cr(^)  are  Borel  sets  on  the  lines  R&). 
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Definition  11.1.2  An  event  A  e  93°°  is  said  to  be  exchangeable  if 

(x  l ,  X2 ,  •  •  •  ,  Xn  —  i ,  Xn  ,  Xn  -|_  1  .  .  . )  G  A 

implies  that  (xw,  X2, . . . ,  xw_i,  xi,  xw+i . . .)  G  A  for  every  n  >  1.  It  is  evident  that 
this  condition  of  membership  automatically  extends  to  any  permutations  of  finitely 
many  components.  Examples  of  exchangeable  events  are  given  by  tail  events. 

Theorem  11.1.3  (Zero-one  law  for  exchangeable  events)  If  ^  are  independent  and 
identically  distributed  and  A  is  an  exchangeable  event ,  then  either  P(A)  =0  or 

P(A)  =  1. 


Proof  By  the  approximation  theorem  (Sect.  3.5),  for  any  A  g  33°°  there  exists  a 
sequence  of  events  An  g  cr(§i, . . . ,  §w)  such  that 

F(AnA  U  ~AAn)  —>  0 


as  n  —>  oo. 

Introduce  the  transformation 

Tn  tU  Tn  ( X  | ,  X2  5  •  •  • )  (Xyi  — |—  1 5  -  *  *  j  Xln  ?Xi,...,  Xfi ,  X2n  -f- 1  •  •  • ) 

and  put  Bn  =  TnAn.  If  A  is  exchangeable,  then  TnA  —  A  and,  for  any  B  g  33°°,  one 
has  P(TnB)  =  P (B)  since  i-j  are  independent  and  identically  distributed.  Therefore 
P (BnA)  =  P (TnAnA)  =  P(AnA),  and  hence  Bn  will  also  approximate  A,  which 
obviously  implies  that  Cn  =  An  Bn  will  have  the  same  approximation  property.  By 
independence  of  An  and  Bn ,  this  means  that 

P(A)=  lim  P (AnBn)=  lim  P2(A„)  =  P2(A). 

n^o o  n^oo 

The  theorem  is  proved.  □ 


11.1.2  Lower  and  Upper  Functions 

Theorem  11.1.3  implies  the  following  interesting  fact,  the  statement  of  which  re¬ 
quires  the  next  definition. 

Definition  11.1.3  For  a  sequence  of  random  variables  {rjn}^=l,  a  numerical  se¬ 
quence  {an\fL]  is  said  to  be  an  upper  sequence  (function)  if,  with  probability  1, 
there  occur  only  finitely  many  events  {qn  >  an).  A  sequence  {an)ff]  is  said  to  be  a 
lower  sequence  (function )  if,  with  probability  1 ,  there  occur  infinitely  many  events 

{hn  >  ®n)- 

Corollary  11.1.2  If  ^  are  independent  and  identically  distributed ,  then  any  se¬ 
quence  {an}  is  either  upper  or  lower  for  the  sequence  of  sums  {5,n};c^1  with 
Sn  =  ^2k=\ 
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In  other  words,  one  cannot  find  an  “intermediate”  sequence  {an}  such  that  the 
probability  of  the  event  A  =  {Sn  >  an  infinitely  often}  would  be  equal,  say,  to  1/2. 

Proof  To  prove  the  corollary,  it  suffices  to  notice  that  the  event  A  is  exchangeable, 
because  swapping  and  in  the  realisation  (§i,  §2,  •  •  •)  influences  the  behaviour 
of  the  first  n  sums  S\ , . . . ,  Sn  only.  □ 

A  similar  fact  holds,  of  course,  for  the  sequence  of  random  variables  {^n}^L\ 
itself,  but,  unlike  the  above  corollary,  that  assertion  can  be  proved  more  easily,  since 
B  =  fe,  >  a  infinitely  often}  is  a  tail  event. 

Remark  11.1.2  In  regard  to  the  properties  of  upper  and  lower  sequences  for  sums 
{Sn}  we  also  note  here  the  following.  If  P(§£  =  c)  1,  and  {an}  is  an  upper  (lower) 
sequence  for  {£/},  then,  for  any  fixed  k  >  0  and  v,  the  sequence  {bn  =  an+k  +  v}ffl 
is  also  upper  (lower)  for  {Sn}.  This  is  a  consequence  of  the  following  relations.  Let 
v\  >  V2  be  such  that 

P(§  >  v\)  >  0,  P(§  <  V2)  >  0. 

Then,  for  the  upper  sequence  {an}  and  the  event  A  =  {Sn  >  aninfmitely  many  times}, 
we  have 


0  =  P(A)  >  P(§i  >  tq)P(A|£i  >  v\) 

>  P(§i  >  r»i)P (Sn  >  an+ 1  —  v\  infinitely  many  times). 

This  implies  that  the  second  factor  on  the  right-hand  side  equals  0,  and  hence  the 

sequence  {an+\  —  v\}  is  also  an  upper  sequence.  On  the  other  hand,  if  §'  =  §  is 
independent  of  §  then 

0  =  P(A)  >  P(§r  +  Sn  >  i;'  +  an  infinitely  many  times',  <  vf) 

>  P(§  <  v2)V(Sn+i  >  an  +  v2  infinitely  many  times) 

=  P«  <  V2)V  (Sn  >  an-\  +  v2  infinitely  many  times). 

Here  the  second  factor  on  the  right-hand  side  equals  0,  and  hence  the  sequence 
[an- 1  +  v2]  is  also  upper.  Combining  these  assertions  as  many  times  as  necessary, 
we  find  that  the  sequence  [an+k  +  v}  is  upper  for  any  given  k  and  v.  □ 

From  the  above  remark  it  follows,  in  particular,  that  the  quantities  lim  sup^^  Sn 
and  liminf^^oo  Sn  cannot  both  be  finite  for  a  sequence  of  sums  of  independent 
identically  distributed  random  variables  that  are  not  zeros  with  probability  1 .  Indeed, 
the  event  B  =  {limsup^^  Sn  e  (a,  b)}  is  exchangeable  and  therefore  P(£)  =  0  or 
P(£)  =  1  by  virtue  of  the  zero-one  law.  If  P(£)  were  equal  to  1,  (b,  b, . . .)  would  be 
an  upper  sequence  for  {Sn}.  But,  by  our  remark,  (a,  a, . . .)  would  then  be  an  upper 
sequence  as  well,  which  would  mean  that 

p(  lim  sup  Sn  <a)  =  l, 
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which  contradicts  the  assumption  P(£)  =  1.  □ 

The  reader  can  also  derive  from  Theorem  11.1.3  that,  for  any  sequences  {an} 
and  {bn},  the  random  variables 

. .  ,  Sn  an 

lim  sup -  and  lim  inf - 

n^o o  bn  n—>oo  hn 

are  constant  with  probability  1 . 


11.2  Convergence  of  Series  of  Independent  Random  Variables 

In  the  present  section  we  will  discuss  in  more  detail  convergence  of  series  of  inde¬ 
pendent  random  variables.  We  already  know  that  such  series  converge  with  proba¬ 
bility  1  or  0.  We  are  interested  in  conditions  ensuring  convergence. 

First  of  all  we  answer  the  following  interesting  question.  It  is  well  known  that  the 
series  YlnLi  n~a  divergent  for  a  <  1,  while  the  alternating  series  YlnLi 
converges  for  any  a  >  0  (the  difference  between  neighbouring  elements  is  of  order 
cm-0'-1).  What  can  be  said  about  the  behaviour  of  the  series  J2T=\  8nn~a,  where 
8n  are  identically  distributed  and  independent  with  E 8n  =  0  (for  instance,  8n  =  zb  1 
with  probabilities  1  /2 )? 

One  of  the  main  approaches  to  studying  such  problems  is  based  on  elucidat¬ 
ing  the  relationship  between  a.s.  convergence  and  the  simpler  notion  of  conver¬ 
gence  in  probability.  It  is  known  that,  generally  speaking,  convergence  in  prob- 

p 

ability  §„  — >  §  does  not  imply  a.s.  convergence.  However,  in  our  situation  when 
t,n  =  Sn  :=  J2k=i  being  independent,  this  is  not  the  case.  The  main  assertion 
of  the  present  section  is  the  following. 

Theorem  11.2.1  If  §&  are  independent  and  Sn  =  Ylk= l  £&>  then  convergence  of  Sn 
in  probability  implies  a.s.  convergence  of  Sn . 

We  will  prove  that  Sn  is  a  Cauchy  sequence.  To  do  this,  we  will  need  the  follow¬ 
ing  inequality. 

Lemma  11.2.1  (The  First  Kolmogorov  inequality)  If  i=j  are  independent  and,  for 
some  b  >  0  and  all  j  <  n , 

P(\S„-Sj\  >b)<p<  1, 

then 

p( max  | Si  |  >  x}  < - P(|5W|  >  x  —  b).  (11.2.1) 

V  j<n  /  1  ~p 


Corollary  11.2.1  IfE^j  =  0  then 

max  \Sj  |  >  v  )  <2P(|S„|  >x-V2Var(5„)). 

V  j<n  / 
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Kolmogorov  actually  established  this  last  inequality  (Lemma  11.2.1  is  an  in¬ 
significant  extension  of  it).  It  follows  from  (11.2.1)  with  p  =  1/2,  since  by  the 
Chebyshev  inequality 


>  ^2Var (S„))  < 


Var (S„  -  Sj ) 
2Var(5„) 


Proof  of  Lemma  11.2.1  Let 

r j  :=  [mink  >  1  :  \Sk\  >  x}. 

Put  Aj  :=  {r]  =  j],  j  =  1,2, ... .  Clearly,  A  j  are  disjoint  events  and  hence 

n  n 

P(|S„|  >  X  -  b)>  y]P(|5„|  >  X~b-  Aj)>  y^P(|5n  -  Sj  |  <  b ;  Aj). 

7  =  1  7=1 

(The  last  inequality  holds  because  the  event  {\Sn  —  Sj \  <  b}Aj  implies  {|^|  > 
x—b}Aj.)  But  Aj  e  a(§  i, . . . ,  %j)  and  {\Sn  —Sj  \  <  b}  e  cr(fj+ 1, . . . ,  §«).  Therefore 
these  two  events  are  independent  and 

n 

P(\Sn\>x-b)>  y]P(A;-)P(|5„  -  Sj  I  <  b) 

7  =  1 

n 

>  (1  ~  P)'S2p(Aj)  =  (1  -p)P(max|5;|  >xj. 
l  J~n 

The  lemma  is  proved.  □ 


Proof  of  Theorem  11.2.1  It  suffices  to  prove  that  {Sn}  is  a.s.  a  Cauchy  sequence,  i.e. 
that,  for  any  s  >  0, 


as  m  — >  oo.  Let 


n,m 


pf  sup  | Sn  —  Sm |  >  2s\  — >►  0 

' n>m  2 


{\Sn  Sm  \  >  s},  Am  An, 


m 


n>m 


Then  relation  (1 1.2.2)  can  be  written  as 

p  (4f)  ->  o 


(11.2.2) 


(11.2.3) 


as  m  — >  oo. 

Since  {Sn}  is  a  Cauchy  sequence  in  probability,  one  has 

Pm,M  ■=  SUp  P (Aen  M)  ->  0 

m<n<M 

as  m  — >  oo  and  M  — >►  oo,  so  that  pm,M  <1/2  for  all  m  and  M  large  enough.  For 
such  m  and  M  we  have  by  Lemma  1 1.2.1,  for  a  =  e  and  x  —2s,  that 
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p(  sup  |S„-SJ>2e)=p(  U  A^n\ 

y™<n<M  \„=m+l  / 


1 


< 


1  Pm,M 


P(Akm)  <  2P {A£M,m) 


By  the  properties  of  probability, 

P(Am)  =  ,}im  P(  U  ^21imSUPP(AM,m)- 

M^°°  \„=m+i  /  M^o o 

Denote  by  S  the  limit  (in  probability)  of  the  sequence  Sn ,  and 

fl®:={|S„-S|>4 

Then  P(££)  -*  0  as  n  ->  oo,  AsMm  c  B^2  U  B ^  2,  and  by  (11.2.4) 

P(A^f)  <  2P (B'n/2)  0  as  m  ->  oo. 

Relation  (11.2.3),  and  hence  the  assertion  of  the  theorem,  are  proved. 


(11.2.4) 


□ 


Corollary  11.2.2  If  E&  =  0  and  Var(^)  <  oo,  Sw  converges  a.s. 


Proof  The  assertion  follows  immediately  from  Theorem  11.2.1  and  the  fact  that 
{Sn}  is  a  Cauchy  sequence  in  mean  quadratic  (E (Sn  —  Sm)2  =  Yk=m+ 1  Var(§£)  ~ ^  0 
as  m  — >  oo  and  n  — >►  oo)  and  hence  in  probability. 

It  turns  out  that  if  E§£  =  0  and  <  c  for  all  k ,  then  the  condition 
Y  Var(§^)  <  oo  is  necessary  and  sufficient  for  a.s.  convergence  of  Sn.] 

Corollary  11.2.2  also  contains  an  answer  to  the  question  posed  at  the  beginning 
of  the  section  about  convergence  of  Y  $nn~a,  where  8n  are  independent  and  identi¬ 
cally  distributed  and  E 8n  =  0. 


Corollary  11.2.3  The  series  Y^n^n  converges  with  probability  1  if  Var(5^)  = 
cr2  <  oo  and  Ya  2  <00. 

Thus  we  obtain  that  the  series  Y^nn~a,  where  8n  =  ±1  with  probabilities  1/2, 
is  convergent  if  and  only  if  a  >  1/2. 

An  extension  of  Corollary  11.2.2  is  given  by  the  following. 


Corollary  11.2.4  (The  two  series  theorem)  A  sufficient  condition  for  a.s .  conver¬ 
gence  of  the  series  Y^n  is  that  the  series  Y^n  and  Var  (§n)  are  convergent. 

The  Proof  is  obvious,  for  the  sequences  Yk= l  and  (§&  —  E §&)  converge 

a.s.  by  Corollary  11.2.2.  □ 


^or  more  detail,  see  e.g.  [31]. 
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It  is  not  hard  to  see  that,  using  the  terminology  of  Sect.  11.1,  the  strong  law  of  large 
numbers  (11.1.1)  means  that,  for  any  s  >  0,  the  sequence  {sn}ff=l  is  an  upper  one 
for  both  sequences  {Sn}  and  {— Sn]  only  if  E£i  =0. 

We  will  derive  the  strong  law  of  large  numbers  as  a  corollary  of  Theorem  10.5.3 
on  finiteness  of  the  infimum  of  sums  of  random  variables. 

Let,  as  before,  §i,  §2,  ■  ■  •  be  independent  and  identically  distributed,  §  = 

Theorem  11.3.1  (Kolmogorov’s  Strong  Law  of  Large  Numbers)  A  necessary  and 

dS . 

sufficient  condition  for  Sn/n  —A  a  is  that  there  exists  =  a. 


Proof  Sufficiency.  Assume,  without  loss  of  generality,  that  E§£  =  0.  Then  it  follows 
from  Theorem  10.5.3  that  the  random  variable  Z ^  =  inf^>o (Sk  +  sk)  is  proper 
for  any  e  >  0  (Sk  +  sk  is  a  sum  of  random  variables  §£  +  e  with  E(^  +  e)  >  0). 
Therefore, 

p(  inf  —  <  —2s 

V  k>n  k 


<P 


(l>*  +  sk  <  —  sn}  J  <  P (Z^  <  —  sn) 

' k>n  ' 


0 


as  n  — >  oo.  In  a  similar  way  we  find  that 


P 


as  n 


oo. 


Since  P(sup^>n  \  Sk/k\  >  2e)  does  not  exceed  the  sum  of  the  above  two  probabili- 

CL  S 

ties,  we  obtain  that  Sn/n  —A  0. 

Necessity.  Note  that 

^ n  Sn  n  1  Sn— l  a.s.  „ 

—  = - 7  — ► 

n  n  n  n  —  1 

so  that  the  event  {\fn/n\  >  1}  occurs  finitely  often  with  probability  1.  By  the  Borel- 
Cantelli  zero-one  law,  this  means  that  P(l?«/nl  >  1)  <  oo  or,  which  is  the 

same,  P(|§  |  >  ji)  <  oo.  Therefore,  by  Lemma  10.5.1,  E§  <  oo  and  with  necessity 
E ^  =  a.  The  theorem  is  proved.  □ 


Thus  the  condition  E§  =  0  is  necessary  and  sufficient  for  {sn}ffi=l  to  be  an  upper 
sequence  for  both  sequences  {Sn}  and  {—5^}.  In  the  next  chapter,  we  will  derive 
necessary  and  sufficient  conditions  for  {sn}  to  be  an  upper  sequence  for  each  of 
the  trajectories  {Sn}  and  {— 5^}  separately.  Of  course,  such  a  condition,  say,  for  the 
sequence  {Sn}  will  be  broader  than  just  E^  =  0. 

We  saw  that  the  above  proof  of  the  strong  law  of  large  numbers  was  based  on 
Theorem  10.5.3  on  the  finiteness  of  inf  5^  which  is  based,  in  turn,  on  Wald’s  iden¬ 
tity  stated  as  Theorem  4.4.3.  There  exist  other  approaches  to  the  proof  that  are  unre¬ 
lated  to  Theorem  4.4.3  (see  below,  e.g.  Theorems  11.4.2  and  12.3.1).  Now  we  will 
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show  that,  using  the  strong  law  of  large  numbers,  one  can  prove  Wald’s  identity 
for  stopping  times  without  any  additional  restrictions  (see  e.g.  conditions  (a)-(d)  in 
Theorem  4.4.3).  Furthermore,  in  our  opinion,  the  proof  below  better  elucidates  the 
nature  of  the  phenomenon  we  are  dealing  with. 

Consider  stopping  times  v  with  respect  to  a  family  of  a -algebras  of  a  special 
kind.  In  particular,  we  assume  that  a  sequence  of  independent  identically 

distributed  random  vectors  fy  =  (§;-,  xj)  is  given  (where  xj  can  also  be  vectors)  and 

Jn  (11.3.1) 

Theorem  11.3.2  (Wald’s  identity  for  stopping  times)  Let  v  be  a  stopping  time  with 
respect  to  the  family  of  a  -algebras  Tn  and  assume  one  of  the  following  conditions 
hold :  (a)  Ev  <  oo;  or  (b)  a  :=  E^j  0. 

Then 


ESv=aEv.  (11.3.2) 

The  assertion  of  the  theorem  means  that  Wald’s  identity  is  true  whenever  the 
right-hand  side  is  defined,  i.e.  only  the  indefinite  case  0  •  oo  is  excluded.  Roughly 
speaking,  identity  (11.3.2)  is  valid  whenever  it  makes  sense. 

This  identity  implies  that,  when  Ev  <  oo,  the  condition  a  ^  0  is  superfluous 
and  that,  for  a  /  0,  the  finiteness  of  E Sv  implies  that  of  Ev.  If  a  =  0  then  the  last 
assertion  is  not  true.  The  reader  can  easily  illustrate  this  fact  using  the  fair  game 
discussed  in  Sect.  4.2. 


Proof  of  Theorem  11.3.2  By  the  strong  law  of  large  numbers,  for  all  large  k ,  the  ratio 
Sk/k  lies  in  the  vicinity  of  the  point  a.  (Here  and  in  what  follows,  we  leave  more 
precise  formulations  to  the  reader.)  By  Lemma  11.2.1,  the  sequence  {£v+k}kLi  has 
the  same  distribution  as  the  original  sequence  •  For  this  “shifted”  sequence, 

consider  the  stopping  time  V2  defined  the  same  way  as  v  for  the  original  sequence. 
Put  v\  :=  v  and  consider  the  sequence  {fVl+V2+/:}^1  which  is  again  distributed 
as  {^k)kLi  (for  vi  +  V2  is  again  a  stopping  time).  For  the  new  sequence,  define 
the  stopping  time  V3,  and  so  on.  Clearly,  the  v^  are  independent  and  identically 
distributed,  and  so  are  the  differences 


SNk  -  .S',vt_| ,  k>  1,  So=0,  where  Nk  '■=  T.vj- 

j= 1 


By  virtue  of  the  strong  law  of  large  numbers,  Swk/Nk  also  lie  in  the  vicinity  of  the 
point  a  for  all  large  k  (or  Nk). 

If  Ev  <  00  then  Nk/k  lie  in  the  vicinity  of  the  point  Ev  as  k  ->  00.  Since 


sNk  _  y_ 

Nk  k  '  Nk’ 


(11.3.3) 


Suk/k  is  necessarily  in  a  neighbourhood  of  the  point  aEv  for  all  large  k.  This  means 
that  the  expectation  E Sv  =  aEv  exists. 
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If  Ev  =  oo  then,  for  a  >  0,  the  assumption  that  the  expectation  E Sv  exists  and 
is  finite,  together  with  equality  (11.3.3)  and  the  previous  argument,  leads  to  a  con¬ 
tradiction,  since  the  limit  of  the  left-hand  side  of  (11.3.3)  equals  a  >  0,  but  that  of 
the  right-hand  side  is  zero.  The  contradiction  vanishes  only  if  E Sv  =  oo.  The  case 
a  <  0  is  dealt  with  in  the  same  way.  The  theorem  is  proved.  □ 

We  now  return  to  the  strong  law  of  large  numbers  and  illustrate  it  by  the  following 
example. 


Example  11.3.1  Let  co  =  (co\,  C02,  • . .)  be  a  sequence  of  independent  random  vari¬ 
ables  taking  the  values  1  and  0  with  probabilities  p  and  q  =  1  —  p,  respectively.  To 
each  such  sequence,  we  put  into  correspondence  the  number 

oo 

£=£(*0  =  I>Jfc2“*, 

k=  1 

so  that  co  is  the  binary  expansion  of  § .  It  is  evident  that  the  possible  values  of  §  fill 
the  interval  [0,  1]. 

We  show  that  if  p  =  q  =  1/2  then  the  distribution  of  §  is  uniform.  But  if  p  1  /2, 
then  §  has  a  singular  distribution.  Indeed,  if  v  =  8k2~k,  where  8k  assume  the 

values  0  or  1 ,  then 


{§  <  x}  =  {£l  <  ($1}  U  {o)\  =  8\,  CD 2  <  82}  u  {(D 1  =  8\,C02  =  82,0)3  <  <$3}  U  •  •  •  . 
Since  the  events  in  this  union  are  disjoint,  for  p  =  1  /2  we  have 

oo 

P(§  <x)  =  ^P(&>1  =  81, . . . ,  (Dk  =  8k,  CDk+l  <  4+ 1) 
k= 0 

oo  oo 

=  Y/2~k’P(m+i<Sk+i)  =  J22~k~lSk+i=x. 

k= 0  k= 0 

This  means  that  the  distribution  of  §  is  uniform,  i.e.  for  any  Borel  set  B  C  [0,  1],  the 
probability  P(§  e  B)  =  mes  B  is  equal  to  the  Lebesgue  measure  of  B.  Put 


n 


n 


33 n  •  —  '  8k,  32 n  .  —  y  '  (Dk . 

k= 1  k=  1 

Then  the  set  {x  :  lim^^oo  Dn/n  =  p}  is  Borel  measurable  and  hence 


mesi 


v  :  lim 


D 


n 


1 

2 


=  P I  lim 


Q 


n 


n  oo  yi 


1 

2 


n  — >  oo  yi 

Since  by  the  strong  law  of  large  numbers  the  right-hand  side  here  is  equal  to  one, 

£n  =  1 

2 


mest 


x  :  lim 


n^o o  yi 


=  1 


In  other  words,  for  almost  all  x  E  [0,  1],  the  proportion  of  ones  in  the  binary  expan¬ 
sion  of  v  is  equal  to  1/2. 
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Now  let  p  ^  1  / 2.  Then 


P 


lim 


n- 


-oo 


n 


although,  as  we  saw  above, 


mesi 


Dn 

:  lim  — 

n^oo  Yl 


so  that  the  probability  measure  is  concentrated  on  a  subset  of  [0,  1]  of  Lebesgue 
measure  zero.  On  the  other  hand,  the  distribution  of  the  random  variable  §  is  con¬ 
tinuous.  This  follows  from  the  fact  that 


oo 

{§=*}=  P){a>£  =  4K 

k= 1 

if  v  is  binary-irrational. 

If  §  is  binary-rational,  i.e.  if,  for  some  r  <  oo,  either  8k  =  0  for  all  k  >  r  or  8k  =  1 
for  all  k  >  r,  the  continuity  follows  from  the  inclusion 


CXD  CXD 

{§=x}cP|{^  =  0}  +  P|{^  =  l}, 

k=r  k=r 


since  the  probabilities  of  the  two  events  on  the  right-hand  side  are  clearly  equal  to 
zero.  The  singularity  of  (x)  for  p  ^  1/2  is  proved.  □ 


We  suggest  the  reader  to  plot  the  distribution  function  of  § . 


11.4  The  Strong  Law  of  Large  Numbers  for  Arbitrary 
Independent  Variables 

Finding  necessary  and  sufficient  conditions  for  convergence 


Sn/b 


a.s. 


n 


-»  a 


when  bn  t  00  and  the  summands  £i,  •  •  •  are  not  identically  distributed  is  a  diffi¬ 

cult  task.  We  first  prove  the  following  theorem. 


Theorem  11.4.1  (Kolmogorov’s  test  for  almost  everywhere  convergence)  Assume 
that  k  =  1, 2, . . . ,  are  independent ,  =  0,  Var(^)  =  <  oo  and ,  moreover ; 


oo  ? 

k= 1 


(11.4.1) 


TTzen  / h 


a.s 


n 


>  0  as  n  ^  oo. 
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Proof  It  follows  from  the  conditions  of  Theorem  11.4.1  that  (see  Corollary  11.2.2) 
the  series  YlkLi  %k/bk  is  convergent  with  probability  1.  Therefore  the  assertion  of 
Theorem  11.4.1  is  a  consequence  of  the  following  well-known  lemma  from  calcu¬ 
lus.  □ 


Lemma  11.4.1  Let  bn  \  oo  and  a  sequence  x\,X2,.-.  be  such  that  the  series 
xk  is  convergent.  Then ,  as  n  — >  oo, 


1 

bn 


oo 

^bkxk 

k= 1 


0. 


Proof  Put  Xn  :=  J2kLn+lxk  so  that  Xn  — >  0  as  n  — >►  oo,  and  A  := 
maxw>o  |X„|  <  oo.  Using  the  Abel  transform,  we  obtain  that 

h  n  n — 1  h 

^  '  bkxk  —  ^  ^bkjXfc— i  -W)  =  ^  ]bk+iXk  ^  '  bkXk 

k= 1  fc=l  £=0  £=1 

n—  1 

—  ^  '(^fc+l  bk)Xk  +  bnXn , 

&=l 


lim  sup 

n^oo 


1 

^77 


**)**- 


(11.4.2) 


Here,  for  a  given  £  >  0,  we  can  choose  an  N  such  that  \Xk\  <  £  for  k  >  N.  Therefore 


n  —  1 


N- 1 


n— 1 


E(/3,f+i  -  bk)xk  <  y  (bk+ 1  -  ^)2f  +  £  ^  (^+i  -  £&) 

fc=l  £=1  £=A 


=  X(bN  -bi)  +  s(bn  —  bN). 


From  here  and  (1 1.4.2)  it  follows  that 


1  n 

lim  sup  —  E  bkxk  <  s 

n^oo  bn 

k=  1 


Since  a  similar  inequality  holds  for  lim  inf,  the  lemma  is  proved.  □ 

We  could  also  prove  Theorem  11.4.1  directly,  using  the  Kolmogorov  inequality, 
in  a  way  similar  to  the  argument  in  Theorem  11.2.1. 


Example  11.4.1  Assume  that  k  =  1,2,...,  are  independent  random  variables 
taking  the  values  ^  =  ±ka  with  probabilities  1/2.  As  we  saw  in  Example  8.4.1, 
for  a  >  —1/2,  the  sums  Sn  of  these  variables  are  asymptotically  normal  with  the 
appropriate  normalising  factor  n~0l~{^2 .  Since  Var(^)  =  o 2  =  k2a ,  we  see  that,  for 
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(3  >  a  +  1/2,  n  P  Sn  satisfies  the  strong  law  of  large  numbers  because,  for  b^—k^ , 
the  series 


oo 


oo 


E^V=E^2“"2/i 


k= l 


k= l 


converges.  The  “usual”  strong  law  of  large  numbers  (with  the  normalising  factor 
n~l)  holds  if  the  value  /3  =  1  is  admissible,  i.e.  when  a  <  1/2. 


Now  we  will  derive  the  “usual”  strong  law  of  large  numbers  (with  scaling  factor 
1  /ri)  under  conditions  which  do  not  assume  the  existence  of  the  variances  Var(^) 
and  are,  in  a  certain  sense,  minimal.  The  following  generalisation  of  the  “sufficiency 
part”  of  Theorem  11.1.3  is  valid. 


Theorem  11.4.2  Let  =  0  and  the  tails  P(|^|  >  t)  admit  a  common  integrable 
majorant : 


p(i^i>0<g(o, 


g(t)dt  <  OO. 


(11.4.3) 


Then,  as  n  — >  oo, 

Sn  a.s. 

- >  0 

n 


(11.4.4) 


d 

Note  that  condition  (11.4.3)  can  also  be  rewritten  as  |^|  <  £,  E£  <  oo.  To 
see  this,  it  suffices  to  consider  a  random  variable  f  >  0  for  which  P(f  >  t)  = 
min(l,g(f)).  Here,  without  loss  of  generality,  we  can  assume  that  g(t)  is  non¬ 
increasing  (we  can  take  the  minimal  majorant  g(t)  :=  sup^  P(|^|  >  t)  I). 

Condition  (11.4.3)  clearly  implies  the  uniform  integrability  of  The  latter  was 
sufficient  for  the  law  of  large  numbers,  but  is  insufficient  for  the  strong  law  of  large 
numbers.  This  is  shown  by  the  following  example. 


Example  11.4.2  Let  ^  be  such  that,  for  t  >  0  and  k  >  1 


p  (&  >  0  = 

git ) 

p iHk  <  - 1 )  <  git). 


if  t  >  k, 


where  g(t)  is  integrable  so  that  the  ^  has  a  positive  atom  of  size  l/(k\n  k)  at  the 

d  s 

point  k.  Evidently,  the  ^  are  uniformly  integrable.  Now  suppose  that  Sn/n  — >  0. 
Since 

OO  OC  j 

k=2  k= 2 

it  follows  by  the  Borel-Cantelli  lemma  that  infinitely  many  events  {§£  >  k}  occur 
with  probability  1.  Since,  for  any  s  <  1/2  and  all  k  large  enough,  \Sk\  <  ek  with 
probability  1,  the  events  Sk+ 1  =  Sk  +  %k+ 1  >  ^(1  —  s)  occur  infinitely  often.  We 
have  obtained  a  contradiction.  □ 
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Proof  of  Theorem  11.4.2  Represent  the  random  variables  ^  in  the  form 

&  =  #+£*>  £:=& I  (l&l  <*)>  (l&l>*)- 

and  denote  by  S*  and  £**  the  respective  sums  of  random  variables  ^  and  Then 
the  sum  Sn  can  be  written  as 


Sn  =  (S*n  -  ESn*)  +  S*n*  -  E 5 


n 


(11.4.5) 


Now  we  will  evaluate  the  three  summands  on  the  right-hand  side  of  (1 1.4.5). 

1 .  Since  ^  are  uniformly  integrable,  we  have 


E§?*=o(l)  as  k  ^  oo, 


ES;*=o(n), 


E  S. 


n 


0  as  n  — >  oo. 


(11.4.6) 


2.  Since 


X!P(l^l  >  k)  -  <  oo, 


we  obtain  from  Theorem  11.1.1  that,  with  probability  1,  only  a  finite  number  of 
random  variables  are  nonzero  and  hence,  as  n  — >  oo, 


ft 


>0. 


(11.4.7) 


3.  To  bound  the  first  summand  on  the  right-hand  side  of  (11.4.5)  we  make  use  of 
Theorem  11.4.1.  Since 

Var(f|)  <  E(^*)2  =2  [  uP(\^\  >u)du<  2  f  ug(u ) dw, 

JO  Jo 

we  see  that  the  series  in  (1 1.4.1)  for  ^  —  E^  and  bk  =  k  admits  the  upper  bound 


A  1  /•* 

2  }  -j~2  /  ug(u)du, 

k —  1 


(11.4.8) 


The  last  series  converges  if  the  integral 


rur  \ 

J  ^2  I  J  ug(u)du\dt 


converges.  Integrating  by  parts,  we  obtain 


if 

t  Jo 


ug(u)  du 


OO  n 

,+i, 


OO 


g(t)dt. 


(11.4.9) 


The  last  summand  here  is  clearly  finite.  Since  g(u)  is  integrable  and  monotone,  one 
has 


ug(u)  =  o(l)  as  u  —>  oo, 


/' 

Jo 


ug(u)du  =  o(t)  as  t  — >  oo, 


and  hence  the  value  of  the  first  summand  in  (11.4.9)  is  zero  at  t  =  oo.  We  have 
established  that  series  (1 1.4.8)  converges,  and  hence,  by  Theorem  1 1.4.1,  as  n  — >  oo, 


S*  —  ES 


n 


* 

n 


a.s 


>0. 


n 


(11.4.10) 
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Combining  (1 1.4.5)— (1 1 .4.7)  and  (11.4.10),  we  obtain  (11.4.4).  The  theorem  is 
proved.  □ 


11.5  The  Strong  Law  of  Large  Numbers  for  Generalised 
Renewal  Processes 

11.5.1  The  Strong  Law  of  Large  Numbers  for  Renewal  Processes 


Let  {t j]  be  a  sequence  of  independent  identically  distributed  variables,  Tn  := 
YTj= l  rj  and  7?(0  :=  min{^  :  Tf  >  t}. 


Theorem  11.5.1  If  tj  =  r  and  Er  =  a  >  0  exists  then ,  as  t  oo, 


i.e.,for  any  s  >  0, 


as  t 


oo. 


?l(t)  a.s.  1 

- > 

t  a 


1 

a 


<  s  for  all  u  >t  )  — >  1 


(11.5.1) 


(11.5.2) 


Proof  First  let  r  >0.  Set 


Tk 

- a 

<  s  for  all  k  >  n 

k 

The  strong  law  of  large  numbers  for  {7^}  means  that  P (An)  — >  1  as  n  — >►  oo. 

Consider  the  function  T (v)  :=  Tpq,  where  |_fj  is  the  integer  part  of  v.  As  was 
noted  in  Sect.  10.1,  rj(t)  is  the  generalised  inverse  function  to  T ( v ).  In  other  words, 
if  we  plot  the  graph  of  the  function  T  (v)  as  a  continuous  line  (including  “vertical” 
segments  corresponding  to  jumps)  then  rj(t)  can  be  regarded  as  the  abscissa  of  the 
point  of  intersection  of  the  graph  of  T  ( v )  with  level  t  (see  Fig.  11.1);  for  the  values 
of  t  coinciding  with  7^,  the  intersection  will  be  a  segment  of  length  1,  and  rj(t)  is 
then  to  be  taken  equal  to  the  right  end  point  of  the  segment. 

Therefore  the  event  that  T  (n)  lies  within  the  limits  v/(a±s)  for  all  sufficiently 
large  v  coincides  with  the  event  that  r](t)  lies  within  the  limits  t (a  ±  e)  for  all  suffi¬ 
ciently  large  t.  More  precisely, 


An  d  Bn  •  — 


n 


U 


a  —  £ 


>  Tj{u)  > 


U 


a  T  s 


for  all  u  >n(a  +  e) 


This  means  that 


P  (B„) 


as  n 


oo. 


This  relation  is  clearly  equivalent  to  (11.5.2). 
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Fig.  11.1  The  relative 
positions  of  a  trajectory  of 
T  ( v )  and  the  levels  v(a±e) 
(see  the  proof  of 
Theorem  11.5.1) 


Now  suppose  r  can  also  assume  negative  values.  Then  rj(t)  :=  min{k  :  Tf  >  t}, 
where  T  f  =  max^<w  Tk,  so  that  r](t)  is  the  generalised  inverse  function 
of  T ( v )  :=  T [„].  Moreover,  it  is  clear  that,  if  T (v)  lies  within  the  limits  v(a  ±  s) 
for  all  sufficiently  large  v,  then  the  same  is  true  for  the  function  T(v).  It  remains  to 
repeat  the  above  argument  applying  it  to  the  processes  T (v)  and  rj(t).  The  theorem 
is  proved.  □ 

Remark  11.5.1  (An  analogue  of  Remarks  8.3.3,  8.4.1,  10.1.1  and  10.5.1)  Conver¬ 
gence  (11.5.1)  persists  if  we  remove  all  the  restrictions  on  the  random  variable  z\. 
Namely,  the  following  assertion  generalising  Theorem  11.5.1  is  valid.  Let  x\  be  an 

arbitrary  random  variable  and  the  variables  =  z,  k  >  l,  satisfy  the  con¬ 

ditions  of  Theorem  11.5.1.  Then  (11.5.1)  holds  true. 

The  Proof  of  this  assertion  is  quite  similar  to  the  proofs  of  the  corresponding 
assertions  in  the  above  mentioned  remarks,  and  we  leave  it  to  the  reader.  □ 

These  assertions  show  that  replacement  of  one  or  several  terms  in  the  consid¬ 
ered  sequences  of  random  variables  with  arbitrary  variables  changes  nothing  in  the 
established  convergence  relations.  (The  exception  is  Theorem  11.1.1,  in  which  the 
condition  Emin(0,  z\)  >  — oo  is  essential.)  This  fact  will  be  used  in  Chap.  13  de¬ 
voted  to  Markov  chains. 


11.5.2  The  Strong  Law  of  Large  Numbers  for  Generalised 
Renewal  Processes 

Now  let  a  sequence  of  independent  identically  distributed  random  vectors  ( zj ,  §/)  = 
(r,  §)  be  given  and  Sn  =  Our  £oal  t0  obtain  an  analogue  of  Theo¬ 

rem  11.5.1  for  generalised  renewal  processes  S(t)  =  S^t)  (see  Sect.  10.6). 
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Theorem  11.5.2  Ifx  >  0  and  there  exist  a  :=  Er  and  a £  := 

- >  —  as  t  — >►  oo. 

f  a 

The  Proof  of  the  theorem  is  almost  obvious.  It  follows  from  the  representation 

S(t)  _  Sr,( t)  1l(t) 

t  Tj{t)  t 

and  the  a.s.  convergence  relations 

Sn  a.s.  a.s.  1 

- >  a £, - >  □ 

n  t  a 

Note  that  the  independence  of  the  components  r  and  §  is  not  assumed  here. 


Chapter  12 

Random  Walks  and  Factorisation  Identities 


Abstract  In  this  chapter,  several  remarkable  and  rather  useful  relations  establishing 
interconnections  between  different  characteristics  of  random  walks  (the  so-called 
boundary  functionals)  are  derived,  and  the  arising  problems  are  related  to  the  sim¬ 
plest  boundary  problems  of  Complex  Analysis.  Section  12.1  introduces  the  concept 
of  factorisation  identity  and  derives  two  fundamental  identities  of  that  kind.  Some 
consequences  of  these  identities,  including  the  trichotomy  theorem  on  the  oscilla¬ 
tory  behaviour  of  random  walks  and  a  one-sided  version  of  the  Strong  Law  of  Large 
Numbers  are  presented  in  Sect.  12.2.  Pollaczek-Spitzer’s  identity  and  an  identity 
for  the  global  maximum  of  the  random  walk  are  derived  in  Sect.  12.3,  followed 
by  illustrating  these  results  by  examples  from  the  ruin  theory  and  the  theory  of 
queueing  systems  in  Sect.  12.4.  Sections  12.5  and  12.6  are  devoted  to  studying  the 
cases  where  factorisation  components  can  be  obtained  in  explicit  form  and  so  closed 
form  expressions  are  available  for  the  distributions  of  a  number  of  important  bound¬ 
ary  functionals.  Sections  12.7  and  12.8  employ  factorisation  identities  to  derive  the 
asymptotic  properties  of  the  distribution  of  the  excess  of  a  random  walk  of  a  high 
level  and  that  of  the  global  maximum  of  the  walk,  and  also  to  analyse  the  distribution 
of  the  first  passage  time. 


In  the  present  chapter  we  derive  several  remarkable  and  rather  useful  relations  es¬ 
tablishing  interconnections  between  different  characteristics  of  random  walks  (the 
so-called  boundary  functionals)  and  also  relate  the  arising  problems  with  the  sim¬ 
plest  boundary  problems  of  complex  analysis. 

12.1  Factorisation  Identities 

12.1.1  Factorisation 

On  the  plane  of  a  complex  variable  A,  denote  by  77  the  real  axis  Im  A  =  0  and  by  77+ 
(77_)  the  half-plane  ImA  >  0  (ImA  <  0).  Let  f(A)  be  a  continuous  function  defined 
on  77. 

Definition  12.1.1  If  there  exists  a  representation 

f (A)  =  f+(A)f_(A),  A  e  77,  (12.1.1) 
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where  f±  are  analytic  in  the  domains  71  ±  and  continuous  on  71  ±  U  77,  then  we  will 
say  that  the  function  f  allows  factorisation.  The  functions  f±  are  called  factorisation 
components  ( positive  and  negative ,  respectively). 

Further,  denote  by  X  the  class  of  functions  f  defined  on  71  that  are  continuous 
and  such  that 

sup  If  (k)  I  <  oo,  inf  If  (A.)  I  >  0.  (12.1.2) 

ken  Xen 

Similarly  we  define  the  classes  X±  of  functions  analytic  in  71±  and  continuous 
on  71  ±  U  77,  such  that 

sup  |f±(A)|  <oo,  inf  |f±(A)|  >0.  (12.1.3) 

kenj  1  1 


Definition  12.1.2  If,  for  an  f  g  X,  there  exists  a  representation  (12.1.1),  where 
f±  G  X±,  then  we  will  say  that  the  function  f  allows  canonical  factorisation. 
Representations  of  the  form 


f(k)  =  f+(k)f-(k)fo. 


f+W/o 

f-W  ’ 


k  G  77, 


where  /o  =  const  and  f±  g  X±,  are  also  called  canonical  factorisations. 


Lemma  12.1.1  The  components  f±  of  a  canonical  factorisation  of  a  function  f  G  X 
are  defined  uniquely  up  to  a  constant  factor. 


Proof  Together  with  the  canonical  factorisation  (12.1.1),  let  there  exist  another 
canonical  factorisation 


Then 


fW  —  0+(^)0-(^)>  k  G  77. 


f+(A)f_(A)=0+(A)0_(A),  k  e  71, 


and,  by  (12.1.2),  we  can  divide  both  sides  of  the  inequality  by  ^+(A)f_(A).  We  get 

f+W=S-W 

0+W  f-W’ 

where,  by  virtue  of  (12.1.2),  the  function  )  belongs  to  the  class  X+ 

(X-).  We  have  obtained  that  the  function  ,  analytical  in  77+,  can  be  analyti¬ 
cally  continued  over  the  line  77  onto  the  half-plane  77_  (to  the  function  ^  ^ ).  After 
such  a  continuation,  in  view  of  (12.1.3),  this  function  remains  bounded  on  the  whole 
complex  plane.  By  Liouville’s  theorem,  bounded  entire  functions  must  be  constant, 
i.e.  there  exists  a  constant  c,  such  that,  on  the  whole  plane 


f+W  =g-W  = 
fl+w  f-W  c’ 

holds,  so  f+(A)  =  C0+(A),  f-(A)  =  c_10_(A).  The  lemma  is  proved.  □ 
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The  factorisation  problem  consists  in  finding  conditions  under  which  a  given 
function  f  admits  a  factorisation,  and  in  finding  the  components  of  the  factorisation. 
This  problem  has  a  number  of  important  applications  to  solving  integral  equations 
and  is  a  version  of  the  well-known  Cauchy-Riemann  boundary-value  problem  in 
complex  function  theory.  We  will  see  later  that  factorisation  is  also  an  important 
tool  for  studying  the  so-called  boundary  problems  in  probability  theory. 


12.1.2  The  Canonical  Factorisation  of  the  Function 

fz(A-)  =  1  -Z(p(X) 


Let  (C2,$,  P)  be  a  probability  space  on  which  a  sequence  of  indepen¬ 

dent  identically  distributed  (fk  =  $)  random  variables  is  given.  Put,  as  before, 
Sn  :=  J2k=i  and  So  =  0-  The  sequence  {S/c}£E0  forms  a  random  walk. 

First  of  all,  note  that  the  function 

fz(A)  :=  1  —  z<p(k),  (p{X)  \=  Ee1^ ,  A  g  77, 

belongs  to  X ,  for  all  z  with  \z\  <  1  (here  z  is  a  complex- valued  parameter).  This 
follows  from  the  inequalities  \<p(k)\  <  1  for  A  g  77  and  \z<p(F)\  <  \z\  <  1. 


Theorem  12.1.1  (The  first  factorisation  identity)  For  \z\  <  1,  the  function  jv(A) 
admits  the  canonical  factorisation 


where 


fz(A)  =  fz+(A)C(z)fz_(A),  A  G  77, 


f+(A)  =  exp  I 
fz-(A)  =  exp 
C  (z)  =  exp  I 


Sk>  0) 


+  > 


oo 


E 


—  E(eiXSk; 
k 


Sk  <  0) 


e  DC_, 


OO  £ 

Eyp(^  =  o)  •• 


(12.1.4) 


(12.1.5) 


Proof  Since  \z\  <  1,  ln(l  —  zcp( A))  exists,  understood  in  the  principal  value  sense. 
The  following  equalities  give  the  desired  decomposition: 


fzW=el 


=  exp 


^(^))  _  pYTV 

\  ^Zkcpk(X)' 

r  oo  k 

-  V  —EeaSk 
k 

[  k= l 

—  exp 

^  k 

[  k=  1 

’  —  exp 

OO  b 

1 

f  oc 

^  1 

~y-E(f';4;  Sk  >  o) 


k= 1 


r  exp 


-Etp^=0) 


£=1 


x  exp 


OO  b 

zk 


V  <  o) 


£=1 
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Show  that  fz+(A)  g  X+.  Indeed,  the  function  E(elXSk\  Sk  >0),  for  every  k  and  A  g 
77+  U  77,  does  not  exceed  1  in  the  absolute  value,  is  analytic  in  77+ ,  and  is  continuous 
on  77+  U  77.  Analyticity  follows  from  the  differentiability  of  this  function  at  any 
point  A  g  77+  (see  also  Property  6  of  ch.f.s  in  Sect.  7.1).  The  function  In  fz+  (7)  is 
a  uniformly  converging  series  of  functions  analytic  in  77+ ,  and  hence  possesses  the 
same  properties  together  with  the  function  fz+  (A) .  The  same  can  be  said  about  the 
continuity  on  77  U  77+ . 

That  f -_(A)  G  X-  is  established  in  a  similar  way.  The  theorem  is  proved.  □ 


12.1.3  The  Second  Factorisation  Identity 

The  second  factorisation  identity  is  associated  with  the  so-called  boundary  function¬ 
als  of  the  random  walk  {Sk}.  On  the  main  probability  space  (72,  #,  P)  we  define, 
together  with  {^},  the  random  variable 

77+  :=  min{k  >1;  Sk  >  0}. 

This  is  the  first-passage  time  to  zero  level.  For  the  elementary  events  such  that  all 
Sk  <  0,  k  >  1,  we  put  77+  :=  00.  Like  the  random  variable  77  ( 0)  in  Sect.  10.1,  the 
variable  77+  is  a  Markov  time. 

The  random  variable  x+  •=  0  is  called  th e  first  nonnegative  sum.  It  is  defined 

on  the  set  {77^  <  00}  only. 

The  first  passing  time  of  zero  from  the  right 

r]°_  :=  min{k  >1;  Sk  <  0} 

possesses  quite  similar  properties,  and  so  does  the  first  nonpositive  sum  X-  •=  S??o  . 

Studying  the  properties  of  the  introduced  random  variables,  which  are  called 
boundary  functionals  of  the  random  walk  {Sk},  is  of  significant  independent  interest. 
For  instance,  the  variable  77+  is  a  stopping  time,  and  understanding  its  nature  is 
essential  for  studying  stopping  times  in  many  more  complex  problems  (see  e.g. 
the  problems  of  the  renewal  theory  in  Chap.  10,  the  problems  of  statistical  control 
described  in  Sect.  4.4  and  so  on).  Moreover,  the  variables  77+  and  x+  will  be  needed 
to  describe  the  extrema 

f  :=  sup  (Si,  52, . . .)  and  y  :=  inf(Si,  S2, . .  .)> 

which  are  also  termed  boundary  functionals  and  play  an  important  role  in  the  prob¬ 
lems  of  mathematical  statistics,  queueing  theory  (see  Sect.  12.4),  etc. 

Put,  as  before,  <p(X)  :=  <p%( X)  =  EelX^ . 

Theorem  12.1.2  (The  second  factorisation  identity)  For  the  ch.f.  of  the  joint  distri¬ 
butions  of  the  introduced  random  variables,  for  \z\  <  1  and  ImX  =  0,  the  canonical 
factorisation 


12.1  Factorisation  Identities 


337 


:=  1  -z<p(X) 

=  [1  -  E(eax+z!?+;  rf+  <  oo)]D-1(z)[l  -  E (e*xV°;  ^  <  <*,)], 
°ffz(X)  holds  true ,  where 

D(z )  :=  1  -E(z’,+  ;  /+  =  0,  rf+  <  oo)  =  1  -E (zv~;  X-  =0,  rj°_  <  oo). 


Proof  Set  :=  max{Si , ,  Sn).  We  have 


n 

<pn  (A)  =  EeikS"  =  E(e'XS" ;  rf\_  =  k)  +  E(eikS" ;  <  0) 

£=1 

=  ^E(ea(S"_s*Vxsn(?7°  =  &))  +  Mn,  (12.1.6) 
£=1 

where  Mn  =  EfV15";  <  0)  and  1(A)  is  the  indicator  of  the  event  A.  For  each 

fixed  k,  the  random  variables  Sn  —  Sk  and  Skl(r]^_  =  k)  =  x+I(t?+  =  k)  are  indepen¬ 
dent.  Hence, 

n 

<pn(X)  =  J2<Pn~k(V E(eax+;  4  =  it)  +  Mn. 
k=  1 

Now  multiply  both  sides  by  zn,  n  =  0,  1, . . . ,  and  then  sum  up  over  n.  We  will  use 
the  convention  that,  for  n  =  0, 

n 

y]=o,  mh  =  i. 

fc=l 

For  the  convolution  of  two  sequences  cn  =  J2k=i  akK-k ,  we  have 

CXD  oo  oo 

m Cn z" = 12 an  z "  m  z"  ’ 

/?=()  /?  =  1  /?=() 

provided  that  the  series  in  this  equality  converges  absolutely.  Since  |z|  <  1  and 
\<PW\  <  1  for  I rn  A  =  0,  one  has 


oo  n 


XJzVW  =  t  _  (A)  =  Ez"  E^"7A)E(e'Ax?;  ^  =  k)  +  J2znM, 


n=0 


/?=()  A=1 


/?=() 


oo 


oo 


oo 


=  E(e''Xx+;  4  =  *)  £zV  w  + 


£=1 


/?=() 


/?=() 


-  z<p(A) 

or,  which  is  the  same 


.  oo 

=  - - —  E (eikrtZ^;  <  oo)  +Vz"M„, 

1  -  ^(A)  ,1T0 


1-E (e,kx+zn+;ri%<oo)  az+(A)  ,  ^ 

fzPO  =  1  -tf>W  =  -™o ,-iS- — £ - rtT  = - 777’  (12.1.7) 

E„=o  z"E(c!XS'<  ;  <  0)  a,_ (A) 

where  az±(A)  denote  the  numerator  and  denominator  of  the  ratio  obtained  for  fz(A). 
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It  is  easy  to  see  that,  if  we  put 

Yn  :=min(5i,...,5„) 


then,  repeating  the  above  arguments,  we  will  arrive  at  the  equality 

_  1  -E r)°_  <  oo)  _  b,_(/,) 
J2T=oznE(-eiXS"’  Yn  >  0)  bz+(X)’ 


(12.1.8) 


where,  similarly  to  the  above,  bz=F(A),  respectively,  denote  the  numerator  and  de¬ 
nominator  in  relation  (12.1.8). 

Now  we  show  that  az±(A)  g  X  and  bz±(A)  e  X  for  \z\  <  1.  Indeed,  for  \z\  <  1 
and  ImA  =  0, 


<  oo)  <  1 


and  therefore 


sup  az+(A) 
xen 


<  00, 


inf  az+(A)  >  0. 
XeTJ 


Since  fz(A)  g  X,  this  also  implies  that  az_(A)  e  X.  In  the  same  way  we  obtain  that 
bZT(A)  g  X.  By  equating  the  right-hand  sides  of  (12.1.7)  and  (12.1.8)  and  multiply¬ 
ing  them  by  az_(A)bz+(A),  we  get 


az+(A)bz+(A)  =  az_(A)bz_(A),  A  g  77.  (12.1.9) 


Further,  the  functions  az+(A)  and  bz+(A)  are  bounded  and  analytic  in  77+  for  the 
same  reasons  as  the  function  fz+(A)  (see  the  proof  of  Theorem  12.1.1).  Similarly, 
az-(A)  and  bz_(A)  are  bounded  and  analytic  in  77_.  We  obtain  that  the  function 
az+(A)bz+(A)  is  bounded  and  analytic  in  77+  and,  by  (12.1.9),  has  an  entire  bounded 
analytic  continuation  over  the  boundary  77  to  the  whole  complex  plane.  This  means 
that  this  function  necessarily  equals  a  constant  c,  and  bz+(A)  =  ca~+(X)  g  X+, 

az_(A)  =  cb~l(X)  g  X-,  so  relations  (12.1.7)  and  (12.1.8)  deliver  a  canonical  fac¬ 
torisation  of  fz(A). 

Further,  elXx  ->  0  as  Im  A  ->  —  oo,  v  <0,  and  therefore 

bz_(-ioo)  =  1  -E (zn-\  X-  =0,  ri°_  <  oo),  az_(-ioo)  =  1, 

az_(X)bz_(X)  =  az-(—ioo)bz-(—ioo)  =  1  -E(zr,~;  X-  =0,  r]°_  <  oo)  =  D(z). 

Substituting  into  (12.1.7)  the  value  az-(A)  =  D(z)/bz~(X)  derived  from  this  equal¬ 
ity,  we  obtain  the  assertion  of  the  theorem.  The  second  relation  for  D(z )  follows 
from  the  equality  D  (z)  =  az+  (/  oo)  bz+  (/  oo) .  The  theorem  is  proved.  □ 


In  the  proof  of  Theorem  12.1.2  we  used,  in  formula  (12.1.6),  a  decomposition  of 
EelXSn  into  summands  corresponding  to  the  disjoint  events 

•  u  b+=q 

.  k= 1 


■=k«>0}  and  <0}. 
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But  the  scheme  of  the  proof  will  still  work  if  we  consider  the  partition  of  Q  into  the 
events  {^n  >0}  and  <  0}.  In  order  to  do  this,  we  introduce  the  random  variables 

rj+  :=  min{k  :  Sk  >  0} 

(77+  =  00  if  £  <  0;  note  that  77+  =  77(C))  in  the  notation  of  Sect.  10.1), 


X+  $ri+  > 

r]_  :=  min{k  :  Sk  <  0}  ( 77 _  =  00  if  y  >  0), 

X—  -=  Sf]-  • 

The  variable  77+  (77-)  is  called  the  time  of  the  first  positive  (negative)  sum  x+ 
(X-).  Now  we  can  write,  together  with  equalities  (12.1.7)  and  (12.1.8),  the  relations 


fz(X)  =  1  -z(p(X)  = 


1  —  E (elkx+zr]+',  77+  <  00) 

SV u<o) 

1  —  E (e^x-z71-',  rj -  <  00) 


(12.1.10) 


J2T=o  znE(eiXSn;  yn  >0)  ' 

Combining  these  relations  with  (12.1.7)  and  (12.1.8),  we  will  use  below  the  same 
argument  as  above  to  prove  the  following  assertion. 


Theorem  12.1.3 

.0 


l-E  ril<oo)  =  D(.z)[l-E(eax+zn+;  rj+  <  oo)], 


1  —  E  ( elXx~zr,~ ;  rk.  <  00)  =  Z)(z)[l  —  E  (elkx~z’1~ ;  rj-  <  oo)]. 

Here  the  function  D(z)  defined  in  Theorem  12.1.2  also  satisfies  the  relations 


(12.1.11) 


00 


00 


D~\z )  =  £]z"P(V  =  0,  <  0)  =  J2znP(S„  =  0.  Yn  >  0).  (12.1.12) 


/?=() 


n= 0 


Clearly,  from  Theorem  12.1.3  one  can  obtain  some  other  versions  of  the  factori¬ 
sation  identity.  For  instance,  one  has 

fz(A.)  =  [1  -E (eax+zv+;  ri+  <  oo)][l  -  E  (eax~ zn- ;  if-  <  oo)].  (12.1.13) 

Representations  (12.1.12)  for  D(z)  imply,  in  particular,  that 

p (Sn  =  0,  Kn  <  0)  =  P (Sn  =  0 ,yn>  0) 
and  that  D(z)  =  1  if  P (Sn  =  0)  =  0  for  all  n  >  1 . 


Proof  of  Theorem  12.1.3  Let  us  derive  the  first  relation  in  (12.1.11).  Comparing 

(12.1.8)  with  (12.1.10)  we  find,  as  above,  that 

[l-E  (eiXx+zr,+  ;  ij+  <  oo)]bz+(A)  =  const  =  1,  (12.1.14) 

since  the  product  equals  1  for  X  =  /  oo.  Therefore  we  obtain  (12.1.13)  by  virtue  of 

(12.1.8) .  It  remains  to  compare  (12.1.13)  with  the  identity  of  Theorem  12.1.2. 
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Expressions  (12.1.12)  for  D(z)  follow  if  we  recall  (see  (12.1.8)  and  (12.1.10)) 
that  the  left-hand  side  of  (12.1.14)  equals 

oo 

Y^znE(e,>-s":  <0) 

_  n  =0 

Since  this  product  also  equals  1,  letting  A  =  — /oo  here  and  in  the  second  identity 
of  (12.1.11)  we  get  the  first  equality  in  (12.1.12).  The  second  equality  is  proved  in 
a  similar  way.  □ 

Remark  12.1.1  It  is  important  to  note  that  Theorems  12.1.2  and  12.1.3,  as  well  as 
proving  the  existence  of  the  identities,  also  provide  a  means  of  finding  the  charac¬ 
teristic  function  of  the  joint  distribution  of  x  and  r/.  That  is,  if  we  manage  some¬ 
how  to  get  a  representation  for  fz(A)  =  1  —  zcp(X)  of  the  form  tjz+(A)tjz_(A),  where 
f)z±(A)  e  X±,  then  by  uniqueness  of  the  canonical  factorisation  we  can,  for  instance, 
claim  that,  up  to  a  constant  factor,  the  function  1  —  E(elXx+zv+;  r/+)  coincides 
with  tjz+(A).  For  examples  of  how  such  arguments  can  be  used,  see  Sects.  12.5 
and  12.6. 


[1  -E(eiXx-z’,°-;  rt  <oo)]. 


12.2  Some  Consequences  of  Theorems  12.1.1-12.1.3 
12.2.1  Direct  Consequences 

Theorems  12.1.1-12.1.3  (and  also  their  modifications  of  the  form  (12.1.13))  and 
the  uniqueness  of  the  canonical  factorisation  (see  Lemma  12.1.1)  directly  imply  the 
next  result. 

Corollary  12.2.1  In  the  notation  of  Theorems  12.1.1  and  12.1.2  one  has  the  follow¬ 
ing  equalities. 

1  -  E (e,Xx+z’1+ ;  ri+  <  oo)  =  fz+(X); 

D(z)  =  C(z); 

1  _  E(eiXx-zv- ;  ri-  <  oo)  =  fz_(A). 

Now  we  will  obtain,  as  corollaries  of  Theorems  12.1.1-12.1.3,  some  further  iden¬ 
tities  in  which  the  parameter  z  is  fixed  and  equal  to  1 . 

Corollary  12.2.2  Letting  z^  1  in  (12.1.13)  we  obtain 
fi(A)  :=  1  -<p(X)  =  [1  -E(e'Xx+;  rj+  <  oo)][l  -  E(e'Xx-;  rf_  <  oo)].  (12.2.1) 

It  is  obvious  that  one  can  similarly  write  other  identities  of  such  type  correspond¬ 
ing  to  the  identities  that  can  be  derived  from  Theorems  12.1.1-12.1.3. 

Clearly,  identity  (12.2.1)  delivers  a  factorisation  of  the  function  fi  (X)  =  1  —  (p(X), 
but  this  factorisation  is  not  canonical  since  f  i  (0)  =  0  and  f\(X)  £X. 
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Corollary  12.2.3  If  there  exists  E§  =  a  <  0  then  P <  oo)  =  1,  exists ,  and 
P(f  <  0)  =  a/fLx-  >0. 

Proof  The  first  relation  follows  from  the  law  of  large  numbers,  because 

P (r}°_  >n)  <  P (S„  >  0)  0 

as  n  — >  oo.  Therefore,  in  the  case  under  consideration,  one  has 

E(eiXx-;  r]0_  <  oo)=EeiXx-. 

The  existence  of  E/£  follows  from  Wald’s  identity  E/£  =  aEr]°_  and  the  theorems 
of  Chap.  10,  which  imply  that  Eij°_  <  E rj_  <  oo,  since  E rj_  is  the  value  of  the 
corresponding  renewal  function  at  0. 

Finally,  dividing  both  sides  of  the  identity  in  Corollary  12.2.2  by  A  and  taking 
the  limit  as  A  — >  0,  we  obtain 

a  =  (\-  P(r]+  <  oo))Ex®  =  P(f  <  0)Ex®  •  □ 

It  is  interesting  to  note  that,  as  a  consequence  of  this  assertion,  we  can  obtain  the 
strong  law  of  large  numbers.  Indeed,  since  {£  <  oo}  is  a  tail  event  and  P(f  <  oo)  > 
P(f  <  0),  Corollary  12.2.3  implies  that  P(£  coo)  =  1  for  a  <  0.  This  means  that  the 
assertion  of  Theorem  10.5.3  holds,  and  it  was  this  assertion  that  the  strong  law  of 
large  numbers  was  derived  from. 

Based  on  factorisation  identities,  we  will  obtain  below  a  generalisation  of  this 
law. 

In  the  remaining  part  of  this  chapter,  to  avoid  trivial  complications,  we  will  be 
assuming  that  §  takes,  with  positive  probability,  both  positive  and  negative  values. 

Corollary  12.2.4  If  a  =  E§  =  0  then  P(ri+  <  oo)  =  P (rj^_  <  oo)  =  1 ,  so  that 

1  —  <p(X)  =  (l-EeiXx+)(l-EeiXx-).  (12.2.2) 

If  moreover, ;  E§2  =  o2  <  oo  then  there  exist  E/+  and  E/£,  and 

o  <y2 
EX+EX°  . 

/-v 

Proof  Consider  the  sequence  ^  ^  —  s,  s  >  0.  Denoting  by  f ,  x_  and  a  the  cor¬ 

responding  characteristics  for  the  newly  introduced  sequence,  we  obtain  by  Corol¬ 
lary  12.2.3  that 

PV  <  0)  <  P(?  <  0)  = 

EX-  X- 


EX°  <  E(fi;  fi  <  0)  =  E(£  -  e;  £  <  e)  <  E(£;  $  <  0)  <  0. 


where 
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So  we  can  make  the  probability  P(f  <  0)  arbitrarily  small  by  choosing  an  appropri¬ 
ate  e,  and  thus  P(£  <  0)  =  F(rj+  =  oo)  =  0.  Similarly,  we  find  that  P (y  >  0)  =  0 
and  hence 

P(?7°_  =  oo )  <  P (rj-  =  oo)  =  P (y  >  0)  =  0. 

The  obtained  relations  and  Corollary  12.2.2  yield  identity  (12.2.2). 

In  order  to  prove  the  second  assertion  of  the  corollary,  divide  both  sides  of  iden¬ 
tity  (12.2.2)  by  X2  =  —(iX)2  and  let  A  e  77  tend  to  zero.  Then  the  limit  of  the  left- 
hand  side  will  be  equal  to  a2/ 2  (see  (7.1.1)),  whereas  that  of  the  right-hand  side  will 
be  equal  to  —  Ex+E/^,  where  Ex+  >  0,  |Ex^  |  >  0.  The  corollary  is  proved.  □ 


Corollary  12.2.5 

1.  We  always  have  ^  p(E-0)  <  qq 

2.  The  following  three  conditions  are  equivalent’. 


(a) 

(b) 

(c) 


P(?  <  00)  =  1; 


P(f  <  0)  =  P (ji+  —  oo)  >  0; 
EI °=i  ^  <  oo  or  £r=i 


P(^>0) 

k 


<  00. 


Proof  To  obtain  the  first  assertion,  one  should  let  z  — >  1  in  the  second  equality  in 
Corollary  12.2.1  and  recall  that 

£>(1)  =  1  -  P(x£  =  0,  nl  <  oo)  >  P(f  >  0)  >  0. 


The  equivalence  of  (b)  and  (c)  follows  from  the  equality 


1  —  P(?7_|_  <  oo)  =  P(f  <  0)  =  exp 


oo 


E 


P(Sk  >  0 


which  is  derived  by  putting  A  =  0  and  letting  z  — >  1  in  the  first  identity  of  Corol¬ 
lary  12.2.1. 

Now  we  will  establish  the  equivalence  of  (b)  and  (c).  If  P(f  <  0)  >0  then  P(f  < 
oo)  >  0  and  hence  P(f  <  oo)  =  1,  since  {£  <  oo}  is  a  tail  event.  Conversely,  let  f 
be  a  proper  random  variable.  Choose  an  N  such  that  P(f  <  N)  >  0,  and  b  >  0  such 
that  k  =  N/b  is  an  integer  and  P(§  <  —  b)  >  0.  Then 

{?  <  0}  D  [§i  <  —b,  ...,§&  <  —b,  sup (—bk  +  §&+i  +  •  •  •  +  %k+j)  —  o]. 

1  j>  i  J 


Since  the  sequence  %k+ 1 ,  Hk+ 2, ...  is  distributed  identically  to  ,  §2,  •  •  •  >  one  has 


P(f  <  0)  >  [P(£  <  -b)f  P(f  <  bk)  >  0. 


□ 


Corollary  12.2.6 

1.  P(f  <  oo,  y  >  — oo)  =  0. 


12.2  Some  Consequences  of  Theorems  12.1.1-12.1.3 


343 


2.  If  there  exists  E§  =  a  <  0  then 


P(?7+  <  oo )  <  1,  P(£  <  oo,  y  ~  —  oo )  =  1, 


P(*S&  >  0) 

E — i — <0° 


’  L - l -  =°° 


k=  1 


k= 1 


3.  If  there  exists  E§  =  a  =  0  then 


P(f  =  oo,  y  =  -oo)  =  1, 

^  P(S*  >0)  ^  P (Sk  <  0) 

L - T - =  00’  L - T - =  0° 


k=  1 


fc=l 


Here  we  do  not  consider  the  case  a  >  0  since  it  is  “symmetric”  to  the  case  a  <  0. 

Proof  The  first  assertion  follows  from  the  fact  that  at  least  one  of  the  two  series 
Ylb=i  and  JfkLi  P(^~°)  diverges.  Therefore,  by  Corollary  12.2.5  either 

P(y  =  — oo)  =  1  or  P(f  =  oo)  =  1. 

The  second  and  third  assertions  follow  from  Corollaries  12.2.3-12.2.5  in  an  ob¬ 
vious  way.  □ 


12.2.2  A  Generalisation  of  the  Strong  Law  of  Large  Numbers 

The  above  mentioned  generalisation  of  the  strong  law  of  large  numbers  consists  of 
the  following. 


Theorem  12.2.1  (The  one-sided  law  of  large  numbers)  Convergence  of  the  series 


oo 

E 

k=  l 


P (Sk  >  sk) 


for  every  s  >  0  is  a  necessary  and  sufficient  condition  for 


P(  limsup  —  <  0  J  =  1 

\  n^oo  CL  J 


(12.2.3) 


Proof  Sufficiency.  If  the  series  converges  then  by  Corollary  12.2.5  we  have 


—  sk}  <  oo^  =  1 


Hence  {sn}  is  an  upper  sequence  for  {Sn}  and 
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But  since  s  is  arbitrary,  we  see  that 

P[  limsup  —  <  0  ]  =  1. 

V  k^oo  k  ) 

Necessity.  Conversely,  if  equality  (12.2.3)  holds  then,  for  any  e  >  0,  with  proba¬ 
bility  1  we  have  Sn/n  <  s  for  all  n  large  enough.  This  means  that 
sup k(Sk  —  sk)  <  oo  with  probability  1,  and  hence  by  Corollary  12.2.5  the  series 

YlkL i  ^Skjfe®  converges.  The  theorem  is  proved.  □ 


Corollary  12.2.7  With  probability  1  we  have 


lim  sup 

n^oo 


=  a , 


where 


a  =  inf ' 


h: 


P (Sk  >  bk) 


<  oo 


Proof  For  any  b  >  a,  the  series  in  the  definition  of  the  number  a  converges.  Since 
{limsup^^  Sn/n  <  b}  is  a  tail  event  and  Srn  =  Sn  —  bn  again  form  a  sequence 
of  sums  of  independent  identically  distributed  random  variables,  Theorem  12.2.1 
immediately  implies  that 


P  ^lim  sup  —  <  b  J  =  1 , 
P ^limsup  —  <  =  P^ 


r  ^  i 

lim  sup  —  <  a  H — 

n  k 


=  1. 


If  we  assume  that  P(limsup  Sn/n  <  a*)  =  1  for  a*  <  a  then,  for  a*  and 

=  J2kj=\ §?>  we  will  have  limsup  ^  <  0,  and 


X^P  (Sk>(a*  +  e)k) 

L - z - <0° 


k= 1 


for  any  s  >  0,  which  contradicts  the  definition  of  a.  The  corollary  is  proved.  □ 


In  order  to  derive  the  conventional  law  of  large  numbers  from  Theorem  12.2.1  it 
suffices  to  use  Corollary  12.2.7  and  assertion  2  of  Corollary  12.2.6.  We  obtain  that  in 
the  case  E§  =  0  the  value  of  a  in  Corollary  12.2.7  is  0  and  hence  limsup  Sn/n  =  0 
with  probability  1.  One  can  establish  in  the  same  way  that  lim  inf  Sn/n  =  0.  □ 


12.3  Pollaczek-Spitzer’s  Identity.  An  Identity  for  S  =  sup^>0  Sk 

It  is  important  to  note  that,  besides  Theorems  12.1.1  and  12. 1 .2,  there  exist  a  number 
of  factorisation  identities  that  give  explicit  representations  (in  terms  of  factorisation 
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components)  for  ch.f.s  of  the  so-called  boundary  functionals  of  the  trajectory  of 
the  random  walk  { Sk },  he.  functionals  associated  with  the  crossing  by  the  trajectory 
of  {Sk}  of  certain  levels  (not  just  the  zero  level,  as  in  Theorems  12.1.1-12.1.3).  The 
functionals 


Sn  =  max  Sk,  0n=  min {k  :  Sk  =  Sn) 

h<n 


and  some  others  are  also  among  the  boundary  functionals.  For  instance,  for  the  triple 
transform  of  the  joint  distribution  of  (Sn,  0n),  the  following  representation  is  valid. 
For  |z|  <  1,  \p |  <  l/\z\  and  ImA  >  0,  one  has 


(l  -  z)J2znE(pe"eikS") 

n= 0 


fz+ (Q) 

f  zp+  (h) 


(For  more  detail  on  factorisation  identities,  see  [3].) 

Among  many  consequences  of  this  identity  we  will  highlight  two  results  that  can 
also  be  established  using  the  already  available  Theorems  12.1.1-12.1.3. 


12.3.1  Pollaczek-Spitzer’s  Identity 


So  far  we  have  obtained  several  factorisation  identities  as  relations  for  numerators 
in  representations  (12.1.7),  (12.1.8)  and  (12.1.9).  Now  we  turn  to  the  denomina¬ 
tors.  We  will  obtain  one  more  identity  playing  an  important  role  in  studying  the 
distributions  of 

Sn  =  max(0,  f„)  =  max(0,  S i, . . . ,  Sn). 

This  is  the  so-called  Pollaczek-Spitzer  identity  relating  the  ch.f.s  of  Sn ,  n  =  1 ,  2, . . . , 
with  those  of  max(0,  Sn),  n  =  1,2,.... 


Theorem  12.3.1  For  \z\  <  1  and  ImA  >  0, 


oo 

n= 0 


=  exp 


OO  k 

zk 


E 


X  max  (0,SjO 


/?=() 


Using  the  notation  of  Theorem  12.1.1,  one  could  write  the  right-hand  side  of  this 
identity  as 

fz+(0) 

(1-Z)fz+W 

(see  the  last  relation  in  the  proof  of  the  theorem). 


Proof  Theorems  12.1.1-12.1.3  (as  well  as  their  modifications  of  the  form  (12.1.13)) 
and  the  uniqueness  of  the  canonical  factorisation  imply  that 

oo 

J2zniE{eiXSn;  O  <  0)  =  [1  -E (eax-zv-;  i?_  <  oo)]-1  =  frJ(A), 

k= 0 
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where  we  assume  that  E(elXS° ;  fo  <  0)  =  1,  so  all  the  functions  in  the  above  relation 
turn  into  1  at  A  =  —  zoo.  Set 


6*  :=  min [k  :  S*k  =  .S'* 


:=  max(0,  5*, . 


(6*  is  time  of  the  first  maximum  in  the  sequence  0,  5*, . . . ,  S*).  Then  the  event 
{1S72  G  dx  ?  Kn  <  0}  can  be  rewritten  as  {S’*  g  —dx,  6*  =  n}.  This  implies  that 


E(eas" ;  <  0)  =  E(e“as» ;  =  n), 

OO 

yVE(Ys»;  9*  =n)  = 

/?=() 


(12.3.1) 


But  the  sequence  5*, . . . ,  5*  is  distributed  identically  to  the  sequence  of  sums 

£j\  ?1*  +  4 - h  §*,  where  ££  =  If  we  put  0n  :=  min{&  :  ^  =  5n} 

then  identity  (12.3.1)  can  be  equivalently  rewritten  as 


yV E(Ys";  0n  =n)  =  (f*_(-A))  \ 

/?=() 

where  f*_(A)  is  the  negative  factorisation  component  of  the  function  1  —  z<p*(A)  = 
1  —  z<p(— A)  corresponding  to  the  random  variable  —  §.  Since 


1  -  zp(-A)  =  fz+(-A)C(z)fz_(-A) 


and  the  function  fz+(— A)  possesses  all  the  properties  of  the  negative  component 
f*_(A)  of  the  factorisation  of  1  —  z<p*(A),  while  the  function  jv_(— A)  has  all 
the  properties  of  a  positive  component,  we  see  that  f*_(A)  =  fz+(— A)  and 


OO 

J2znE(eiXS";  6„=n ) 

n= 0 


1 

fz+C^) 


Now  we  note  that 

n 

EeiXSn  =J2E(eaSn;  0n=k) 

k=0 

n 

=  y]E(e('xs*;  6k  =  k,  Sk+ 1  -  Sk  <  0, . . . ,  Sn  -  Sk  <  0) 

k=0 

n 

=  y]E(e!'AS*;  6k  =  k)P (Sn-k  =  0). 

k= 0 


Since  the  right-hand  side  is  the  convolution  of  two  sequences,  we  obtain  that 


OO  _ 

y^Y  EeikSn 
11=0 


yvp(s„=o) 

n—  0 


1 

fz+C^) 


Putting  A  =  0  we  get 


yvp(s„=o) 

/?=() 


1 


1  —  z 
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Therefore, 

oo 

^7  E  eik^n 

n= 0 


fz+(0) 


(1  —  z)fz+(A.) 


=  exp 


=  exp 


=  exp 


OO  b 

zk 


OO  b 

ZK 


-  ln(l  -  Z)  +  J2  yE(^Si  ;Sk>0)-J2j  p(V  >  °) 

k=  1  k=  1 


oo  b 

zk 


OO  b 

zk 


—  E(eiXSk;  Sk  >0)  +  J2  TP(Sk  -  0) 


k=  1 

OO  b 

zk 


k= 1 


E  V  E<*' 


A.max(0,57:) 


) 


k=  1 


The  theorem  is  proved. 


□ 


12.3.2  An  Identity  for  S  =  supfc>0  S/( 


The  second  useful  identity  to  be  discussed  in  this  subsection  is  associated  with  the 
distribution  of  the  random  variable  S  =  sup^>0  Sf  =  max(0,  f )  (of  course,  we  deal 
here  with  the  cases  when  P(S  <  oo)  =  1).  This  distribution  is  of  interest  in  many 
applications.  Two  such  illustrative  applications  will  be  discussed  in  the  next  subsec¬ 
tion. 

We  will  establish  the  relationship  of  the  distribution  of  S  with  that  of  the  vector 
(X+,  ?7+)  and  with  the  factorisation  components  of  the  function  1  —  z^(A). 

First  of  all,  note  that  the  random  variable  is  a  Markov  time.  For  such  variables, 
one  can  easily  see  (cf.  Lemma  10.2.1)  that  the  sequence  §*  =  §^++i ,  =  ^++2,  •  •  • 

on  the  set  {co  :  <  oo}  (or  given  that  77+  <  oo)  is  distributed  identically  to 

§1,  §2,  •  •  •  and  does  not  depend  on  (r/+,  §i, . . . ,  §/?+).  Indeed, 


P(§*  £  Bl  >  •  •  •  »  %k  ^  Bjc  Yj+  —  j,  §1  G  A\ ,  .  . 
P(§/+ 1  €  B\, ... ,  %j+k  ^  Bk\%\  ^  . 


•  ,  %r]+  £  ^+) 

. .  ,%j  e  Aj;  r]+ 


=  j) 


Considering  the  new  sequence  {^Jj^  we  note  that  it  will  exceed  level  0  (the 
level  x+  for  the  original  sequence)  with  probability  p  =  P (77  +  <  00),  and  that  the 
distribution  of  f  *  =  sup^>1(§1*  +  •  •  •  +  %£)  coincides  with  the  distribution  of  f  = 

SUP&>1  $k- 

Thus,  with  S *  :=  max(0,  (*),  we  have 

0  on  {co  :  ri+  =  00}, 

^  =  S(co)  = 

Sr]+  +  S  =  x+  +  S  on  {co  :  77+  <  00}. 

Since,  as  has  already  been  noted,  S*  does  not  depend  on  x+  and  77+,  and  the  distri¬ 
bution  of  S*  coincides  with  that  of  S',  we  have 
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EeiA5  =  P (jj+  =  oo)  +  E(eiX{x++s*);  ri+  <  oo) 

=  (1  -  p)  +  EeiASE(ea*+;  ??+  <  oo). 

This  implies  the  following  result. 

Theorem  12.3.2  If  <  00  or'  which  is  the  same ,  p  =  P(^+  <  oo)  <  1, 

then 

1  —  p  1  —  p 


EeikS  = 


1  -  E(eiXx+,  ?7+  <  oo)  fi+(A)  ’ 


In  exactly  the  same  way  we  can  obtain  the  relation 


EeiXS  = 


1  -  po 


1  —  E(elXx+ ,  77^  <  oo) 


(12.3.2) 


where  po  =  P(^7+  <  oo)  <  1. 

In  this  case,  one  can  write  a  factorisation  identity  in  following  form: 


,  (1  —  Po)(l  —  EeiXx~)  (l-p)(l-EeiX*-) 

1  ~V(M  = - O - •  (12.3.3) 

In  Sects.  12.5-12.7  we  will  discuss  the  possibility  of  finding  the  explicit  form 
and  the  asymptotic  properties  of  the  distribution  of  S. 


12.4  The  Distribution  of  S  in  Insurance  Problems  and  Queueing 
Theory 

In  this  section  we  show  that  the  need  to  analyse  the  distribution  of  the  variable  S 
considered  in  Sect.  12.3  arises  in  insurance  problems  and  also  when  studying  queue¬ 
ing  systems. 


12.4.1  Random  Walks  in  Risk  Theory 

Consider  the  following  simplified  model  of  an  insurance  business  operation.  De¬ 
note  by  x  the  initial  surplus  of  the  company  and  consider  the  daily  dynamics  of 
the  surplus.  During  the  &-th  day  the  company  receives  insurance  premiums  at  the 
rate  ^  >  0  and  pays  out  claims  made  by  insured  persons  at  the  rate  %k  —  0  (in 
case  of  a  fire,  a  traffic  accident,  and  so  on).  The  amounts  §&  =  ^  are  ran¬ 

dom  since  they  depend  on  the  number  of  newly  insured  persons,  the  size  of  pre¬ 
miums,  claim  amounts  and  so  on.  For  a  foreseeable  “homogeneous”  time  period, 
the  amount  £&  can  be  assumed  to  be  independent  and  identically  distributed.  If  we 
put  Sn  :=  J2k=\  then  the  company’s  surplus  after  n  days  will  be  Zn  =  x  —  Sn, 
provided  that  we  allow  it  to  be  negative.  But  if  we  assume  that  the  company  ruins  at 
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the  time  when  Zn  first  becomes  negative,  then  the  probability  of  no  ruin  during  the 
first  n  days  equals 


where,  as  above,  Sn  =  max^<n  Sk-  Accordingly,  the  probability  of  ruin  within  n  days 
is  equal  to  P (Sn  >  x),  and  the  probability  of  ruin  in  the  long  run  can  be  identified 
with  P (S  >  x).  It  follows  that,  for  the  probability  of  ruin  to  be  less  than  1,  it  is  nec¬ 
essary  that  E§£  <  0  or,  which  is  the  same,  that  E§^“  <  E§A+.  When  this  condition  is 
satisfied,  in  order  to  make  the  probability  of  ruin  small  enough,  one  has  to  make  the 
initial  surplus  v  large  enough.  In  this  connection  it  is  of  interest  to  find  the  explicit 
form  of  the  distribution  of  S ,  or  at  least  the  asymptotic  behaviour  of  P(S  >  x)  as 
x  — >  oo.  Sections  12.5-12.7  will  be  focused  on  this. 


12.4.2  Queueing  Systems 


Imagine  that  “customers”  who  are  to  be  served  by  a  certain  system  arrive  with  time 
intervals  x\,  Z2, . . .  between  successive  arrivals.  These  could  be  phone  calls,  planes 
landing  at  an  airport,  clients  in  a  shop,  messages  to  be  processed  by  a  computer, 
etc.  Assume  that  serving  the  k-th  customer  (the  first  customer  arrived  at  time  0,  the 
second  at  time  z\,  and  so  on)  requires  time  Sk,  k  =  1, 2, ...  If,  at  the  time  of  the  k- 
th  customer’s  arrival,  the  system  was  busy  serving  one  of  the  preceding  customers, 
the  newly  arrived  customer  joins  the  “queue”  and  waits  for  service  which  starts 
immediately  after  the  system  has  finished  serving  all  the  preceding  customers.  The 
problem  is  to  find  the  distribution  of  the  waiting  time  wn  of  the  n- th  customer — the 
time  spent  waiting  for  the  service. 

Let  us  find  out  how  the  quantities  wn+\  and  wn  are  related  to  each  other.  The 
(n  +  l)-th  customer  arrived  xn  time  units  after  the  n- th  customer,  but  will  have  to 
wait  for  an  extra  sn  time  units  during  the  service  of  the  n- th  customer.  Therefore, 


Wn-\- 1  —  Wn  Zn  T  Sn, 

only  if  wn  —  zn  +  sn  >  0.  If  wn  —  zn  +  sn  <  0  then  clearly  wn+ \  =0.  Thus,  if  we 
put§n+i  :=sn  ~zn,  then 

wn+i  =  max(0,  wn  T  £w+i),  n  >  1,  (12.4.1) 

with  the  initial  value  of  w\  >0.  Let  us  find  the  solution  to  this  recurrence  equa¬ 
tion.  Let,  as  above,  Sn  =  Ylk=i  &  •  Denote  by  0(n)  the  time  when  the  trajectory  of 
0,  Si , . . . ,  Sn  first  attains  its  minimum: 


0(n)  :=min{k  :  Sk  =  Sn}, 
Then  clearly  (for  wq  :=  w i) 


S„  :=  min  Si. 

0 <j<n  J 


UJn+i  =  W\  +  Sn  if  we{n )  =  W\  +  Sn  >  0 


(12.4.2) 
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(since  in  this  case  the  right-hand  side  of  (12.4.1)  does  not  vanish  and  Wk+ 1  =  Wk  +  %k 
for  all  k  <  n ),  and 


Wn+ 1  —  Sn  So(n)  if  U)\  4"  S_n  <  0 
( W0(n )  =  0  and  Wk+ 1  =  Wk  +  %k  for  all  k  >  0(n)).  Put 


(12.4.3) 


n 


$nJ 

k=n— j+\ 


so  that 


Sn,n  • —  max  Snj, 

0<j<n  J 


$n,  0  —  0?  —  $n- 


Then 


'n  6{n )  —  —  max  Ojj  —  bn,n-> 

0<j<n 

so  that  wq  +  =  uq  +  Sw  —  Sw>w  and  the  inequality  w\  +  Sn  <  0  in  (12.4.3)  is 

equivalent  to  the  inequality  SWjW  =  Sn  —  So(n)  >  uq  +  5W.  Therefore  (12.4.2)  and 
(12.4.3)  can  be  rewritten  as 

W/i+i  =  max(Sn>n,  mi  +  S„).  (12.4.4) 

This  implies  that,  for  each  fixed  v  >  0, 

P(w«+i  >x)=  P (Sn,n  >x)  +  P (S*,*  <  X,  uq  +  >  x). 

Now  assume  that  ^  =  §  are  independent  and  identically  distributed  with  E§  <  0. 

Then  Sw>w  =  Sw  and,  as  n  — >►  oo,  we  have  Sw  —  oo,  P(uq  +  Sn  >  x)  — >  0  and 

P(5n  >  v)  t  P(S  >  v).  We  conclude  that,  for  any  initial  value  uq,  the  following 
limit  exists 


lim  P (wn  >  x)  =  P (S  >  x). 

n^oo 

This  distribution  is  called  the  stationary  waiting  time  distribution.  We  already 
know  that  it  will  be  proper  if  E§  =  Esq  —  Eri  <  0.  As  in  the  previous  section,  here 
arises  the  problem  of  finding  the  distribution  of  S.  If,  on  the  other  hand,  Esi  >  E  x\ 
or  Esi  =  Eri  and  s\^z\  then  the  “stationary”  waiting  time  will  be  infinite. 


12.4.3  Stochastic  Models  in  Continuous  Time 

In  the  theory  of  queueing  systems  and  risk  theory  one  can  equally  well  employ 
stochastic  models  in  continuous  time ,  when,  instead  of  random  walks  {Sn},  one  uses 
generalised  renewal  processes  Z(t)  as  described  in  Sect.  10.6.  For  a  given  sequence 
of  independent  identically  distributed  random  vectors  (zj,  £/),  the  process  Z(t)  is 
defined  by  the  equality 


Z(t)  • -  Zy(/-)  , 
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where 


k 

v(t)  :=  max{&  :  7^  <  t},  :=  r j. 

7  =  1 


For  instance,  in  risk  theory,  the  capital  inflow  during  time  t  that  comes  from 
regular  premium  payments  can  be  described  by  the  function  qt,  q  >  0.  The  insurer 
covers  claims  of  sizes  fi,  £2,  •  •  •  with  time  intervals  z\,  r2, . . .  between  them  (the 
first  claim  is  covered  at  time  Ti).  Thus,  if  the  initial  surplus  is  x,  then  the  surplus  at 
time  t  will  be 


x  +  qt  —  Zvq)  =  x+qt  —  Z(t). 


The  insurer  ruins  if  inft(x  +  qt  —  Z(t))  <  0  or,  which  is  the  same, 

sup (Z(t)  —  qt)  >  v. 

t 

It  is  not  hard  to  see  that 


sup(Zy(?)  -  qt)  =  sup  Sk  =:  S, 
t  k>  0 

where  Sk  =  Y^j= 1  %j  —  Kj  ~  <lrj-  Thus  the  continuous-time  version  of  the  ruin 
problem  for  an  insurance  company  also  reduces  to  finding  the  distribution  of  the 
maximums  of  the  cumulative  sums. 


12.5  Cases  Where  Factorisation  Components  Can  Be  Found  in 
an  Explicit  Form.  The  Non-lattice  Case 

As  was  already  noted,  the  boundary  functionals  of  random  walks  that  were  consid¬ 
ered  in  Sects.  12.1-12.3  appear  in  many  applied  problems  (see  e.g.,  Sect.  12.4).  This 
raises  the  question:  in  what  cases  can  one  find,  in  an  explicit  form,  the  factorisation 
components  and  hence  the  explicit  form  of  the  boundary  functionals  distributions 
we  need?  Here  we  will  deal  with  factorisation  of  the  function  1  —  (p( A.)  and  will  be 
interested  in  the  boundary  functionals  x±  and  S. 


12.5.1  Preliminary  Notes  on  the  Uniqueness  of  Factorisation 

As  was  already  mentioned,  the  factorisation  of  the  function  1  —  <p( A.)  obtained  in 
Corollaries  12.2.2  and  12.2.4  is  not  canonical  since  that  function  vanishes  at  X  =  0. 
In  this  connection  arises  the  question  of  whether  a  factorisation  is  unique.  In  other 
words,  if,  say,  in  the  case  E§  <  0,  we  obtained  a  factorisation 


l-^(k)  =  f+(k)f_(k), 
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where  f±  are  analytic  on  TJ±  and  continuous  on  77±  U  77,  then  under  what  conditions 
can  we  state  that 


Ee 


iXS 


f+(Q) 

f+A) 


(cf.  Theorem  12.3.2)7  In  order  to  answer  this  question,  in  contrast  to  the  above,  we 
will  have  to  introduce  here  restrictions  on  the  distribution  of  § . 


1.  We  will  assume  that  E§  exists,  and  in  the  case  E§  =  0  that  E§2  also  exists. 

2.  Regarding  the  structure  of  the  distribution  of  §  we  will  assume  that  either 


(a)  the  distribution  F  is  non-lattice  and  the  Cramer  condition  on  ch.f.  holds: 

limsup  (p(X)  <  1,  (12.5.1) 

|  A.  |  — >  oo 
Im  /.=() 


or 

(b)  the  distribution  F  is  arithmetic. 


Condition  (12.5.1)  always  holds  once  the  distribution  F  has  a  nonzero  absolutely 
continuous  component.  Indeed,  if  F  =  Ffl  +  F^  +  Fj  is  the  decomposition  of  F  into 
the  absolutely  continuous,  singular  and  discrete  components  then,  by  the  Lebesgue 
theorem,  f  elXxEa(dx )  — >►  0  as  |A|  — >  oo  on  Im  A  =  0,  and  so 

lim sup | (A) |  <  Fs((-oo,  oo))  +Fd((-oo,  oo))  <  1. 

|  A.  |  — >  oo 

For  lattice  distributions  concentrated  at  the  points  a  +  hk,  k  being  an  inte¬ 
ger,  condition  (12.5.1)  is  evidently  not  satisfied  since,  for  A  =  2i rj/h,  we  have 
\cp(X)\  =  \el2na/h  \  =  1  for  all  integers  j .  The  condition  is  also  not  met  for  any  dis¬ 
crete  distribution,  since  any  “part”  of  such  a  distribution,  concentrated  on  a  finite 
number  of  points,  can  be  approximated  arbitrarily  well  by  a  lattice  distribution.  For 
singular  distributions,  condition  (12.5.1)  can  yet  be  satisfied. 

Since,  for  non-lattice  distributions,  \cp(X)\  <  1  for  A  ^  0,  under  condition  (12.5.1) 
one  has 


sup  (p  (A)  <  1 

|  A.  |  >£ 


(12.5.2) 


for  any  s  >  0.  This  means  that  the  function  f(A)  =  1  —  cp(X)  has  no  zeros  on  the  real 
line  77  (completed  by  the  points  ±oo)  except  at  the  point  A  =  0. 

In  case  (b),  when  the  distribution  of  F  is  arithmetic,  one  can  consider  the  ch.f. 
(p (A)  on  the  segment  [0,  2tv]  only  or,  which  is  the  same,  consider  the  generating 
function  p(z )  =  E zK  in  which  case  we  will  be  interested  in  the  factorisation  of  the 
function  1  —  p(z)  on  the  unit  circle  |z|  =  1. 

Under  the  aforementioned  conditions,  we  can  “tweak”  the  function  1  —  (p( A)  so 
that  it  allows  canonical  factorisation. 

In  this  section  we  will  confine  ourselves  to  the  non-lattice  case.  The  arithmetic 
case  will  be  considered  in  Sect.  12.6. 


Lemma  12.5.1  Let  the  distribution  F  be  non-lattice  and  condition  (12.5.1)  hold. 
Then: 


12.5  Explicit  Form  of  Factorisation  Components.  The  Non-lattice  Case 


353 


i.  im  <  0  then  the  function 

1  —  cpfk) 

t >(*):= - ^-(ik  +  l) 

l  A 

belongs  to  X  and  allows  a  unique  canonical  factorisation 

h(A.)  =  h+(A.)h_(A.), 

where 

1-P 

EeikS  ’ 


t>+(A.)  :=  1  —  E(elXx+‘,  rj +  <  oo)  = 


t)_(A)  := 


1  -  E^1* 

/X 


o 


(a  + 1). 


2.  //E£  =  0  and  E§2  <  oo  the  function 


»»(«:=  + 


belongs  to  X  and  allows  a  unique  canonical  factorisation 

t>°(x)  =  (x)t>2(x), 

where 

1  -  Ee''A*+ 

:= - - - (/A-l), 


t>°  (A)  := 


iX 

1  -  EeiXx 


0 


/A. 


(/A.  +  1) 


(cf  Corollaries  12.2.2  and  12.2.4). 


(12.5.3) 


(12.5.4) 

(12.5.5) 


(12.5.6) 


(12.5.7) 


Here  we  do  not  consider  the  case  >  0  since  it  is  “symmetric”  to  the  case 
E|  <  0  and  the  corresponding  assertion  can  be  derived  from  the  assertion  1  of  the 
lemma  by  applying  it  to  the  random  variables  —  (or  by  changing  A.  to  —A  in  the 
identities),  so  that  in  the  case  E£  >  0,  the  function  1  (iX  —  1)  will  allow  a 
unique  canonical  factorisation. 

The  uniqueness  of  the  canonical  factorisation  immediately  implies  the  following 
result. 


Corollary  12.5.1  If  for  E^  <  0,  we  have  a  canonical  factorisation 

p(A.)  =  m+(A)rD_(A), 

then 

EeiXS  =  tt’+<°) 


U>+(X) 


(12.5.8) 


Proof  of  Lemma  12.5.1  Let  E§  <  0.  Since 

1  -<p(k) 
ik 


E§  >0 
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as  A  — >  0  and  (12.5.1)  is  satisfied,  we  see  that  t)(A)  is  bounded  and  continuous  on  77 
and  is  bounded  away  from  zero.  This  means  that  t>(A)  e  X. 

Further,  by  Corollary  12.2.2  (see  (12.2.1)) 


Eeikx-)(iX+l) 

Jx 


[l  —  E(e 


i^X+. 

5 


ri+  <  oo)], 


where  Ex  £  e  (—oo,  0) .  Therefore,  similarly  to  the  above,  we  find  that 


t)_(A)  := 


Eeik*-)(ik  +  l) 

Jx 


eX. 


Furthermore,  t>_(A)  e  X-  (the  factor  iX  +  1  has  a  zero  at  the  point  X  =  i  e  77+ ). 
Evidently,  we  also  have 

t>+(A)  =  1  —  E(elXx+\  77+  <  oo)  eXH  X+. 


This  proves  the  first  assertion  of  the  lemma.  The  last  equality  in  (12.5.4)  follows 
from  Theorem  12.3.2.  The  uniqueness  follows  from  Lemma  12.1.1. 

The  second  assertion  is  proved  in  a  similar  way  using  Corollary  12.2.5,  which 
implies  that  E/+  £  (0,  oo),  E/£  £  (— oo,  0),  and 


0  V)  = 


'(1  —  Eeikx+)(iX  —  1)' 

'(1  -Eeikx°-)(iX  +  1)' 

iX 

/A 

(12.5.9) 


where,  as  before,  we  can  show  that  t>°(A)  £  X  and  the  factors  on  the  right-hand  side 
of  (12.5.9)  belong  toXfl  X±,  correspondingly.  The  lemma  is  proved.  □ 


12.5.2  Classes  of  Distributions  on  the  Positive  Half-Line  with 
Rational  Ch.F.s 


As  we  saw  in  Example  7.1.5,  the  ch.f.  of  the  exponential  distribution  with  density 
fie~Px  on  (0,  oo)  is  /3/(/3  —  iX).  The  y-th  power  of  this  ch.f.  ocrresponds  to  the 
gamma-distribution  T  pj  (the  j-th  convolution  of  the  exponential  distribution)  with 
density  (see  Sect.  7.7) 


pkxj 

U  -  1)! 


This  means  that  a  density  of  the  form 


v  >  0. 


k  lk 

I]  Y,akJxJ~le 


k= 1  ./  =  I 


(12.5.10) 


on  (0,  oo)  (where  all  fa  >  0  are  different)  can  then  be  considered  as  a  mixture 
of  gamma-distributions  and  its  ch.f.  will  be  a  rational  function  Pm  (A) /  Qn  (A),  where 
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Pm  and  Qn  are  polynomials  of  degrees  m  and  n ,  respectively  (for  definiteness,  we 
can  put) 

K 

Qn(V  =  l\(Pk-iVlk,  (12.5.11) 

k=  1 

and  necessarily  m  <  n  (see  Property  7.1.8)  with  n  =  J2k=i  4-  Here  all  the  zeros  of 
the  polynomial  (2^  are  real.  But  not  only  densities  of  the  form  (12.5.10)  can  have 
rational  ch.f.s.  Clearly,  the  Fourier  transform  of  the  function  e~^x  cos  yx,  which  can 
be  rewritten  as 

]-e~Px  (e‘YX  +e~iyx),  (12.5.12) 

will  also  be  a  rational  function.  Complex- valued  functions  of  this  kind  will  have 
poles  that  are  symmetric  with  respect  to  the  imaginary  line  (in  our  case,  at  the  points 
X  =  —ip  zb  y).  Convolutions  of  functions  of  the  form  (12.5.12)  will  have  a  more 
complex  form  but  will  not  go  beyond  representation  (12.5.10),  where  pk  are  “sym¬ 
metric”  complex  numbers.  Clearly,  densities  of  the  form  (12.5.10),  where  pk  are 
either  real  and  positive  or  complex  and  symmetric,  Re  pk  >  0,  exhaust  all  the  dis¬ 
tributions  with  rational  ch.f.s  (the  coefficients  of  the  “conjugate”  complex-valued 
exponentials  must  coincide  to  avoid  the  presence  of  irremovable  complex  terms). 

It  is  obvious  that  the  converse  is  also  true:  rational  ch.f.s  PmW/Qn(X)  corre¬ 
spond  to  densities  of  the  form  (12.5.10).  In  order  to  show  this  it  suffices  to  decom¬ 
pose  Pm(X)  /  Qn(X)  into  partial  fractions,  for  which  the  inverse  Fourier  transforms 
are  known. 

We  will  call  densities  of  the  form  (12.5.10)  on  (0,  oo)  exponential  polynomials 
with  exponents  Pk.  We  will  call  the  number  Ik  the  multiplicity  of  the  exponent  pk 
—  it  corresponds  to  the  multiplicity  of  the  pole  of  the  Fourier  transform  at  the  point 
X  =  —ipk  (recall  that  Qn (X)  =  Y\k=  l  (Pk  ~  iX)lk).  One  can  approximate  an  arbitrary 
distribution  on  (0,  oo)  by  exponential  polynomials  (for  more  details,  see  [3]). 


12.5.3  Explicit  Canonical  Factorisation  of  the  Function  d(X)  in 
the  Case  when  the  Right  Tail  of  the  Distribution  F  Is  an 
Exponential  Polynomial 

Consider  a  distribution  F  on  the  whole  real  line  (— oo,  oo)  with  E§  <  0  and  such 
that,  for  v  >0,  the  distribution  has  a  density  that  is  an  exponential  polynomial 
(12.5.10).  Denote  by  the  class  of  all  such  distributions.  The  ch.f.  of  a  distri¬ 
bution  F  e  can  be  represented  as 

cp(X)  =  cp+ (X)  +  (P~  (X) , 

where  the  function 

cp~(X)=  E(eix*;$<0),  ?<=F, 
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is  analytic  on  77_  and  continuous  on  77_  U  77,  and  <p+(A)  is  a  rational  function 


4W 

GnW’ 


m  <  n, 


analytic  on  77+ .  Here  <p+(A)  is  a  ch.f.  up  to  the  factor  P(§  >  0)  >  0. 
It  is  important  to  note  that,  for  real  /z,  the  equality 


f+(p)  :=  <p+(— ip)  =  JL(e^;  §  >  0) 


Pm  (  ip) 
Qni-ip) 


(12.5.13) 


(12.5.14) 


only  makes  sense  for  p  <  /fi,  where  /fi  is  the  minimal  zero  of  the  polynomial 
Qn  ( — i /z)  (i.e.  the  pole  of  t/f+(A)).  It  is  necessarily  a  simple  and  real  root  since 
the  function  ^r+(p)  is  real  and  monotonically  increasing.  Further,  t/f+(/z)  =  oo 
for  p  >  p 1.  Therefore  the  function  E(elX §  >  0)  is  undefined  for  Re  /A  > 
(ImA  <  —Pi).  However,  the  right-hand  side  of  (12.5.14)  (and  hence  <p+(A))  can  be 
analytically  continued  onto  the  lower  half-plane  Im  A  <  —  p\  to  a  function  defined 
on  the  whole  complex  plane.  In  what  follows,  when  we  will  be  discussing  zeros  of 
the  function  1  —  <p( A)  on  77_,  we  will  mean  zeros  of  this  analytical  continuation , 
i.e.  of  the  function  cp~( A)  +  Pm  (A) /  Qn  (A) . 

Further,  note  that,  for  distributions  from  the  class  £?,  the  Cramer  condition 
(12.5.1)  on  ch.f.s  always  holds,  since  <p+( A)  ->  0  as  |A|  ->  oo,  and 


limsup  (p  (A)  =  limsup  <p-(  A) 

| A. |  — ^ oo,  Ze/7  |Z|^oo,  Z<e/7 


<P(§  <0)  <  1. 


For  a  distribution  F  e  £3\  the  canonical  factorisation  of  the  functions  t>(A)  and 
d°(A)  (see  (12.5.3)  and  (12.5.6))  can  be  obtained  in  explicit  form  expressed  in  terms 
of  the  zeros  of  the  function  1  —  (p{X). 


Theorem  12.5.1  Let  there  exist  E§  <  0.  In  order  for  the  positive  component  tu+(A) 
of  a  canonical  factorisation 

d(A)  =  tr>+(A)ro_(A),  tv±  e  X±  D  X, 


to  he  a  rational  function,  it  is  necessary  and  sufficient  that  the  function 

<p+{X)=  E(eiXS;  £  >  0) 


is  rational . 

If<P+  =  Pm  /  Qn  is  an  uncancellable  ratio  of  polynomials  Pm  and  Qn  of  degrees 
m  and  n ,  respectively ,  m  <  n,  then  the  function  1  —  <p(X)  has  precisely  n  zeros  on 
77 _  (we  denote  them  by  —ip i, . . . ,  —  ipn),  and 


tt>+(A)  = 


nLi(w-^) 

QniO 


(12.5.15) 


where  Qn(—ipk )  f1  0  (i-e.  ratio  (12.5.15)  is  uncancellable). 

If  all  zeros  —ipk  eire  arranged  in  descending  order  of  their  imaginary  parts : 


Re p\  <  Re /Z2  <  •  •  •  <  Re pn , 

then  the  zero  —ip\  will  be  simple  and  purely  imaginary ,  p\  <  min(Re/Z2,  Pi), 
where  P\  is  the  minimal  zero  of  Qn(—ip). 
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The  theorem  implies  that  the  component  tr>_(A.)  can  also  be  found  in  an  explicit 
form: 

(1  —  <p{X))(iX  +  1)  Qn(X) 
tV-(X)  = - = - . 

^nLi  0**-^) 

From  Corollary  12.5.1  we  obtain  the  following  assertion. 


Corollary  12.5.2  7/E§  <  0  and  cp  —  Pm  /  Qm  then 

j^eikS  _  ^+(0)  _  QnfX)  ri/v  =  l  dk 

“m+W“nLi  (M-^)  Gn(0) 


By  Theorem  12.5.1  and  (12.3.3)  we  also  have 


?7+  <  oo )  =  1  — 


a -?(*))  QnW  YXLidk 
1  -  p  ni£l  (P'k  -  (0) 

(i-p)nLi(w-^)6,(Q) 

Qni^)  ru  dk 


(12.5.16) 


Proof  of  Theorem  12.5. 1  The  proof  of  sufficiency  will  be  divided  into  several  stages. 
1 .  In  the  vicinity  of  the  point  X  =  0  on  the  line  77,  the  value  of 


(1  —  (p{X)){iX  +  1) 

Jx 


lies  in  the  vicinity  of  the  point  —  E§  >  0.  By  virtue  of  (12.5.2),  outside  a  neighbour¬ 
hood  of  zero  one  has 


iX  +  1 

arg— - —  e 
i  X 


(12.5.17) 


where,  for  a  complex  number  z  =  \z\eiy ,  ar gz  denotes  the  exponent  y.  In  (12.5.17) 
argz  means  the  principal  value  of  the  argument  from  (— n,  n].  Clearly,  argziZ2  = 
argzi  +  arg£2-  This  implies  that,  when  X  changes  from  -7  to  7  for  large  T,  the 
values  of  argt>(A)  do  not  leave  the  interval  (— 7T,  tv)  and  do  not  come  close  to  its 
boundaries.  Moreover,  the  initial  and  final  values  of  t)(A)  lie  in  the  sector  argz  e 
(—  j ,  j).  This  means  that,  for  any  T,  the  following  relation  is  valid  for  the  index  of 
the  function  t>  on  [-T,  T]: 

mdTv:=2-J  d(  arg  0(A))  e  b  <  1.  (12.5.18) 

(If  the  distribution  F  has  a  density  on  (— oo,0]  as  well  then  <p{±.T)  0  and 

indr  o  — >  0  as  T  — ►  oo.) 

2.  Represent  the  function  0  as  the  product  of/.)  =  t)  i  where 


t>i(X)  = 


t>2(X)  = 


(iX  +  1)" 

QnX)  ' 

g„(A.)(l-y(A.)) 

lA.O'A.  +  l)""1 


QnW  -  PmW  -  QnXyroy 

iX(iX  +  l)"-1 


(12.5.19) 
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We  show  that 


1 

| n  +  indr  t>2 1  <  -•  (12.5.20) 

2< 

In  order  to  do  this,  we  first  note  that  the  function  tq  is  analytic  on  77+  and  has  there 
a  zero  of  multiplicity  n  at  the  point  A  =  i .  Consider  a  closed  contour  consist¬ 
ing  of  the  segment  [— T,T ]  and  the  semicircle  |A|  =  T  lying  in  77+ .  According  to 
the  argument  principle  in  complex  function  theory,  the  number  of  zeros  of  the  func¬ 
tion  tq  inside  T t  equals  the  increment  of  the  argument  of  tq  (A)  divided  by  In  when 
moving  along  the  contour  in  the  positive  direction,  i.e. 

f  dargtq(A)  =n. 

J<j+ 


As,  moreover,  tq(A)  — >  (—  l)n  =  const  as  |A|  — >  oo  (see  (12.5.11)  and  (12.5.19)), 
we  see  that  the  increment  of  argtq  on  the  semicircle  tends  to  0  as  T  — >  oo,  and 
hence 


indr  t>  i 


1 

—  /  d  argt)i(A) 

J —oo 


=  n. 


It  remains  to  note  that  indr  t>  =  indr  t> i  +  indr  t>2  and  make  use  of  (12.5.18). 

3.  We  show  that  1  —  <p(A)  has  precisely  n  zeros  in  77_.  To  this  end,  we  first  show 
that  the  function  tq(A),  which  is  analytic  in  77_  and  continuous  on  77_  U  77,  has  n 
zeros  in  77_ .  Consider  the  positively  oriented  closed  contour  TyT  consisting  of  the 
segment  [—T,T]  (traversed  in  the  negative  direction)  and  the  lower  half  of  the  circle 
|  A  |  =  T,  and  compute 

-  f  <7argtq(A).  (12.5.21) 

2  J7~ 

Since  t>2(A)  ~  (-1)*(1  -  <p~(k))  (see  (12.5.11)  and  (12.5.19)),  \<p~(k)\  <  1  as 
| A |  — >  oo,  ImA  <0,  for  large  T  the  part  of  integral  (12.5.21)  over  the  semicircle 
will  be  less  than  1/2  in  absolute  value.  Comparing  this  with  (12.5.20)  we  obtain 
that  integral  (12.5.21),  being  an  integer,  is  necessarily  equal  to  n.  This  means  that 
tq(A)  has  exactly  n  zeros  in  77_,  which  we  will  denote  by  —  //zi, . . . ,  —  i\in.  Since 
Qui-i^k)  7^  0  (otherwise  we  would  have,  by  (12.5.19),  Pm(— i/ik)  =  0,  which 
would  mean  cancellability  of  the  fraction  Pm/Qn ),  the  function  1  —  <p(A)  has  in  77_ 
the  same  zeros  as  112(A)  (see  (12.5.19)). 

4.  It  remains  to  put 


w+M  = 


tu_(A)  = 


niU(w-tt) 

QnW 

(QnW  -  PmW  -  +  1) 

ttn*=i(wt-iA.) 


and  note  that  tr>±  e  X±  n  X. 

The  last  assertion  of  the  theorem  follows  from  the  fact  that  the  real  func¬ 
tion  1 //(in)  =  for  Im  fi  =  0  is  convex  on  [0,  /3i),  \lr'(0)  =  E§  <  0  and 
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i A(/x)  — >►  oo  as  i±  — >  P\.  Therefore  on  [0,  p\)  there  exists  a  unique  real  solution 
to  the  equation  t/^(/x)  =  1.  There  are  no  complex  zeros  in  the  half-plane  Re/x  <  fi\ 
since  in  this  region,  for  Im  /x  ^  0,  one  has 


<  t/^(Re/x)  <  VK/xi)  =  1 


because  of  the  presence  of  an  absolutely  continuous  component. 

Necessity.  Now  let  tt>+(A)  be  rational.  This  means  that 

poo 

tt)+(A.)  =  ci+  /  elXxg(x)dx, 

Jo 

where  c\  =  tt>+(/oo)  and  g(v)  is  an  exponential  polynomial.  It  follows  from  the 
equality  (see  (12.5.5)) 


1  -  (p{X)  =  to+(X)  .//  =  C2W+(A.)(1  -  Eea*-) 


iX  + 


=  C2  Cl  + 


poo 

Jo 


elXx  g(x)  dx 


)f 

/  J —oo 


eilxdW(x), 


where  W(x)  = 
that  is  equal  to 


-P(x_  <  x)  for  x  <0,  C2  =  const,  that  §  has  a  density  for  x  >  0 


dW(t)  g(x  —  t). 


Since  the  integral 


dW(t)(x 


-  t)ke =  e~Px  ^(-1)' 

7=0 

r° 

Ckj=  dW(t)tk~Jept, 

J  —OO 


is  an  exponential  polynomial,  the  integral  dW(t)g(x  —  t)  is  also  an  exponen¬ 
tial  polynomial,  which  implies  the  rationality  of  §  >  0).  The  theorem  is 

proved.  □ 


Example  12.5.1  Let  the  distribution  F  be  exponential  on  the  positive  half-line: 


P(£  >  x)  =qe~Px,  p>0,  q<\. 


Then  (p+(k)  =  qp/(P  —  ik)  and  we  can  put  m  =  0,  n  =  1,  Po(k)  =  qp,  Q i(A)  = 
P  —  i  X.  The  equation  ^(/x)  :=  JLe^1  =  1  has,  in  the  half-plane  Re  /x  >  0,  the  unique 
solution  /xi, 


w+W  = 


/xi  —  /A. 

0|(A) 


(see  (12.5.15)).  By  Corollary  12.5.2, 


E  JXS  _  Ml  QlO)  _  M l(P  -  Aa)  _  AM  P  -  Ml  Ml 

<2i (0)  (Ml  —  *X)  PilM-iX)  p  fi  fi i  —  iX' 
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This  yields  P (S  =  0)  =  /jl\/ ft, 

P (S  G  dx) 
dx 


l±\e 


—  /jL\X 


for  x  >  0, 


i.e.  the  distribution  of  S  is  exponential  on  (0,  oo)  with  parameter  i±\  and  has  a 
positive  atom  at  zero. 


Example  12.5.2  (A  generalisation  of  Example  12.5.1)  Let  F  have,  on  the  positive 
half-line,  the  density  J2k=i  ak^~^kX  (a  sum  of  exponentials),  where  0  <  j3\  <  p2  < 
•  •  •  <  /3n,  ak  >  0.  Then 

n 

QnW  =  Y\(A-iV- 

k= 1 

As  was  already  noted  in  Theorem  12.5.1,  the  equation  ^(/x)  :=  cp(—iX)  =  1  has,  on 
the  interval  (0,  /3 1),  a  unique  zero  fi\.  The  function  t /f~(pi)  :=  (p~(—ipL)  is  continu¬ 
ous,  positive,  and  bounded  for  /x  >  0.  On  each  interval  (/3k,  Pk+i)>  k  =  l, . . . ,  n  —  1, 
the  function 


n 

ir+(li)  :=  <p+(-iii)  =  Y2 

k= 1 


dkfik 

(Pk  -  n) 


is  continuous  and  changes  from  —  oo  to  oo.  Therefore,  on  each  of  these  inter¬ 
vals,  there  exists  at  least  one  root  /ik+\  of  the  equation  i/f  (/x)  =  1.  Since  by  The¬ 
orem  12.5.1  there  are  only  n  roots  of  this  equation  in  Re/x  >  0,  we  obtain  that  /x^+i 
is  the  unique  root  in  (/3k,  Pk+\)  and 


ro+(A)  = 


Wk=\^k~iX) 

QnW 


n 


Eeixs  =  P[ 


(Pk  ~  i^)^k 


=j  (M*  -iX)Pk 


(12.5.22) 


This  means  that  1  —  p  :=  P(  S  =  0)  =  Wk=\  >  and 


P (S  e  dx) 
dx 

k= l 

where  pk  g  (fa-i,  Pk),  k  =  1, . . . ,  n,  Po  =  0,  and  the  coefficients  bk  are  defined  by 
the  decomposition  of  (12.5.22)  into  partial  fractions. 

By  (12.5.16), 


M kX  forv>0, 


E [elXx+\  r)+  <  oo)  =  1 


1  ~  P 

EeiXS 


n 


i-n 

k= i 


(fik  -  ik) 

(Pk  ~  ik)  ’ 


(12.5.23) 


so  the  conditional  distribution  of  x+  given  x+  <  oo  has  a  density  which  is  equal  to 


Y^Cke~Pk\  (12.5.24) 

k=\ 

where  the  coefficients  Ck ,  similarly  to  the  above,  are  defined  by  the  expansion 
of  the  right-hand  side  of  (12.5.23)  into  partial  fractions.  Relation  (12.5.24)  means 
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that  the  density  of  x+  has  the  same  “structure”  as  the  density  of  §  does  for  x  >  0, 
but  differs  in  coefficients  of  the  exponentials  only.  By  (12.5.16)  this  property  of  the 
density  of  x+  holds  in  the  general  case  as  well. 


12.5.4  Explicit  Factorisation  of  the  Function  t)(X)  when  the  Left 
Tail  of  the  Distribution  F  Is  an  Exponential  Polynomial 


Now  consider  the  case  where  the  left  tail  of  the  distribution  F  has  a  density  which  is 
an  exponential  polynomial  (belongs  to  the  class  £T).  In  this  case, 


where 


<P~W  =  E(eiAh  $  <0) 


QnW' 


K 

Q«M  =  Y\<ftk-i»lk, 

k=  1 


K 

n  =  YJh,  Reft  <  0,  m  <  n. 
k= l 


Theorem  12.5.2  Let  there  exist  E§  <  0.  For  the  positive  component  of  the  canonical 
factorisation  h(A)  =  tt)+(A.)tt)_(A.)  of  the  function 

r\\  (1  —  <p(L))(iX  +  1) 

fiW  = - - - 

l  A 

to  be  representable  as 


W+(A.)  =  (1  -<p(k))R(k), 


where  Rfk)  is  a  rational  function ,  it  is  necessary  and  sufficient  that  the  func¬ 
tion  (p~ (A)  is  rational.  If  ip~{L)  =  Pm(f)/Qn(f)  then  the  function  1  —  ipfik) 
has  precisely  n  —  1  zeros  in  the  half-plane  on  ImA  >  0  which  we  denote  by 
i  fL  i  ? . . . ,  i  fXyi — 1 9  and 


R(X)  = 


ixYYlZl O*  -  to 


Theorem  12.5.2,  Corollary  12.5.1  and  (12.3.3)  imply  the  following  assertion. 


Corollary  12.5.3  //E£  <  0  and  ip  (A.)  =  Pm  (A) /  Qn  (A)  then 

as  ro+(0)  E%Qn(0)iXY[nkZ{(Uk-iO 

tLe  = - = - : - , 

«+(/-)  (i-<p(0)Qn(OnnkZ{uk 

Eeax_  =  j  (i  -  PomQrMiyWkZii^ -  m 

UlZl^QnW 
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Here  the  density  of  x~  has  the  same  “structure”  as  the  density  of  §  does  for 

x  <  oo. 


Proof  of  Theorem  12.5.2  The  proof  is  close  to  that  of  Theorem  12.5.1,  but  unfortu¬ 
nately  is  not  its  direct  consequence.  We  present  here  a  brief  proof  of  Theorem  12.5.2 
under  the  simplifying  assumption  that  the  distribution  F  is  absolutely  continuous. 
Using  the  scheme  of  the  proof  of  Theorem  12.5.1,  the  reader  can  easily  reconstruct 
the  argument  in  the  general  case. 

Sufficiency.  As  in  Theorem  12.5.1,  we  verify  that  the  trajectory  of  d(A),  —  oo  < 
A  <  oo,  does  not  intersect  the  ray  arg  =  —  n ,  so  in  our  case  there  exists 


ind  d  :=  lim  ind^  d  =  0. 

T^oo 


Put  t>  :=  t> 1 1>2,  where 

Qn-Pm-Qn<P+  (iX  +  l)(iX  -  l)n~l 

01  ■  iXdX-ir-1  2'  Qn 

Clearly,  tq  £  X-  Pi  X  and  has  exactly  n  —  1  zeros  in  77_ .  Hence,  by  the  argument 
principle,  ind  t»2  =  —  (n  —  1),  and 


ind  t)  i  =  —  ind  t»2  =  n  —  1 . 


Since  tq  e  X+  PI  X ,  again  using  the  argument  principle  we  obtain  that  tq,  as  well 
as  1  —  cp,  has  exactly  n  —  1  zeros  —iji i, . . . ,  —i/jLn- 1  in  77+.  Putting 

(1  -<p)Qn  (iX  +  [)Wlz\Uik-iX) 

TO+ := - ; - ,  ro_  := - - - , 

iXYXll\(M->X)  Qn 

we  obtain  a  canonical  factorisation. 

Necessity.  Similarly  to  the  preceding  arguments,  the  necessity  follows  from  the 
factorisation  identity 


1  —  cpfk)  =c\(l  —  E(elXx+',  ri+  <  oo))ro_(A) 

*o 


=  c  i 


POO 

Jo 


jXx 


L 


dV{x)\c2+  I  elXxg(x)dx  ), 


(X) 


where  V (x)  =  P(x+  >  h+  <  oo)  for  x  >  0,  cz-  =  const  and  g(v)  is  an  exponential 
polynomial.  The  theorem  is  proved.  □ 


As  in  Sect.  12.5.1,  we  do  not  consider  the  case  E§  >  0  since  it  reduces  to  apply¬ 
ing  the  aforementioned  argument  to  the  random  variable  —  §. 


12.5.5  Explicit  Canonical  Factorisation  for  the  Function  t)°(X) 

The  goal  of  this  subsection,  as  it  was  in  Sects.  12.5.3  and  12.5.4,  is  to  find  an  ex¬ 
plicit  form  of  the  components  to±(A)  in  the  canonical  factorisation  of  the  function 
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t)°(A.)  =  1  (x2  +  1)  in  (12.5.6)  in  terms  of  the  zeros  of  the  function  1  —  cp{X) 

in  the  case  where  =  0  and  either  <^+(A.)  or  ip~ (A.)  is  a  rational  function.  When 
E§  =  0,  it  is  sufficient  to  consider  the  case  where  <p+(A)  is  rational,  i.e.  the  distribu¬ 
tion  F  has  on  the  positive  half-line  a  density  which  is  an  exponential  polynomial,  so 
that 


Pm(f) 

QnM' 


K 

QnO-)  =  Y\(A-l'-)lk, 

k= 1 


K 


n  =  'Y^h- 

k= 1 


The  case  where  it  is  the  function  <p  (A.)  that  is  rational  is  treated  by  switching  to 
random  variable  — 


Theorem  12.5.3  Let  E§  =  0  and  E§2  =  a 2  <  oo.  For  the  positive  compo¬ 
nent  tu+(A.)  of  the  canonical  factorisation 

t>°(A.)  =  tD+(A.)m®  (A,),  tu±  G  X±  PI  X, 

to  be  a  rational  function  it  is  necessary  and  sufficient  that  the  function  <^+(A.)  = 
E(eiX * ;  §  >  0)  is  rational .  If  ip  (A.)  —  Pm  (A.) /  Q,n  (A.)  is  an  uncancellable  ratio 
of  polynomials  of  degrees  m  and  n ,  respectively ,  m  <  n,  then  the  function  1  —  cp(  A.) 
/tas  exactly  n  —  1  zeros*  in  FI-  which  we  denote  by  —ip i, . . . ,  —ipn~ i,  anr/ 

nm_re:!(W-‘^-D  (1-<^(A))(/A  +  1)G„(A) 

A.2  — /A.) 

(12.5.25) 


Relation  (12.5.3)  and  the  uniqueness  of  canonical  factorisation  imply  the  follow¬ 
ing  representation. 


Corollary  12.5.4  Under  the  conditions  of  Theorem  12.5.3, 


EeiXx+  =  1  - 


iXEX+Q„(0)UnkZlV 

l FT: > 


Proof  The  corollary  follows  from  (12.5.7),  (12.5.25),  the  uniqueness  of  canonical 
factorisation  and  the  equalities 


e°(0)  =  Ex+, 


1  -  Eeikx+  = 


iX  -  I 


= 

+  <(0) 
ikEx+QnmUZl 


(Pk  - 


Qn  (^)  Uk=l  Fk 


Thus,  here  the  “structure”  of  the  density  of  x+  again  repeats  the  structure  of  the 
density  of  §  for  v  >  0.  □ 


Proof  of  Theorem  12.5.3  The  proof  is  similar  to  that  of  Theorem  12.5.1. 

Sufficiency. 

1.  In  the  vicinity  of  the  point  A.  =  0,  A.  g  77,  the  value  of  t)°(A.)  lies  in  the  vicinity 
of  the  point  tx2/2  >  0  by  Property  7.1.5  of  ch.f.s.  Outside  of  a  neighbourhood  of 
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zero,  similarly  to  (12.5.17),  we  have 


arg(l  -<p(k))  e  ^j, 


A2  +  1 

arg  — =  0. 


This,  analogously  to  (12.5.18),  implies 

.  t 

indr  t>°  := 


l—  J  d(argo°(A)) 


2n 

2.  Represent  t)°  as  d°  =  t> 1 1>2,  where 

(ik  +  l)n 


e  (—b/2,  b/2),  b<  1 


tJi  := 


tJ2  := 


Q 


n 


(1  —  <^)(1  —  ik)  Qn  ( Qn-Pm-Qn<P  )(1  “  ik) 


(12.5.26) 


X2(ik  +  l)n~l 
Then,  similarly  to  (12.5.20),  we  find  that 

indr  hi  — >  n  as  T  — >  oo, 


k2(ik  +  l)n~l 


|  n  +  indr  t»2 1  < 


1 


3.  We  show  that  1  —  <p(k)  has  exactly  n  —  1  zeros  in  77_.  To  this  end,  note  that 
the  function  t)2,  which  is  analytic  in  77_  and  continuous  on  77_  U  77  has  exactly  n 
zeros  in  77_.  As  in  the  proof  of  Theorem  12.5.1,  consider  the  contour  .  In  the 
same  way  as  in  the  argument  in  this  proof,  we  obtain  that 


~J  c/(argti2(/-))  =n, 


2tx 

so  that  t)2  has  exactly  n  zeros  in  77_.  Further,  by  (12.5.26)  we  have  t>2  =  t>3t>4, 
where  the  function  03  =  (1  —  ik)/(ik  +  1)  has  one  zero  in  77_  at  the  point  k  =  —i. 
Therefore  the  function 


t>4  = 


( Qn  Pm  Qnty  ) 
k2(ik  +  \)n~2 


which  is  analytic  in  77_,  has  n  —  1  zeros  there.  Since  the  zeros  of  1  —  ^j(A)  and  those 
of  tn(A)  in  ft-  coincide,  the  assertion  concerning  the  zeros  of  1  —  <p(k)  is  proved. 
4.  It  remains  to  put 


ntZliM-im-iW 

QnO-) 


W°.(A) 


(1  —  cp(X))(ik  +  1  )Qn(k) 

v  njt=i  (m*  - 1^) 


and  note  that  tr>  ±  e  %±  fl  %. 

Necessity  is  proved  in  exactly  the  same  way  as  in  Theorems  12.5.1  and  12.5.2. 
The  theorem  is  proved.  □ 


12.6  Explicit  Form  of  Factorisation  in  the  Arithmetic  Case 

The  content  of  this  section  is  similar  to  that  of  Sect.  12.5  and  has  the  same  structure, 
but  there  are  also  some  significant  differences. 
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12.6.1  Preliminary  Remarks  on  the  Uniqueness  of  Factorisation 


As  was  already  noted  in  Sect.  12.5,  for  arithmetic  distributions  defined  by  collec¬ 
tions  of  probabilities  pf  =  P(§  =  k),  we  should  use,  instead  of  the  ch.f.s  (p(X),  the 
generating  functions 

oo 

p(z)  =  Ez?  =  y]  zkpk 

k=—o o 

defined  on  the  unit  circle  \z\  =  1,  which  will  be  denoted  by  77,  as  the  axis  ImA.  =  0 
was  in  Sect.  12.5.  The  symbols  77+  (77_)  will  denote  the  interior  (exterior)  of  77. 
For  arithmetic  distributions  we  will  discuss  the  factorisation 

1  -p(z)  =  f+(z)f-(z) 

on  the  unit  circle ,  where  f±  are  analytic  on  77±  and  continuous  including  the  bound¬ 
ary  77.  Similarly  to  the  non-lattice  case,  the  classes  of  such  functions,  that,  more¬ 
over,  are  bounded  and  bounded  away  from  zero  on  77±,  we  will  denote  by  X±. 
Continuous  bounded  functions  on  77,  which  are  also  bounded  away  from  zero,  form 
the  class  X.  The  notion  of  canonical  factorisation  on  77  is  introduced  in  exactly  the 
same  way  as  above.  Factorisation  components  must  belong  to  the  classes  X±.  The 
uniqueness  of  factorisation  components  (up  to  a  constant  factor)  is  proved  in  the 
same  way  as  in  Lemma  12.1.1. 

We  now  show  that  if,  similarly  to  the  above,  we  “tweak”  the  function  1  —  p(z) 
then  it  will  admit  a  canonical  factorisation.  We  will  denote  the  tweaked  function  and 
its  factorisation  components  by  the  same  symbols  as  in  Sect.  12.5.  This  will  not  lead 
to  any  confusion. 


Lemma  12.6.1  1.  If  E§  <  0  then  the  function 

(1  -  p(z))z 


o(z)  := 


1  -  z 


belongs  to  X  and  admits  a  unique  canonical  factorisation 

»(z)  =  »+(z)»— (z), 

where 

,  .  1  —  p 

t>+(z)  :=  1  -  E(zx+;  jj+  <  oo)  =  g  ,  p  :=  P (jj+  <  oo), 


»-(z)  := 


(1-E  zx-)z 
1  —  z 


2.  If  Ef  =  0  and  E§2  <  oo  then  the  function 


t >°(z)  := 


(1  -  p(z))z 


(1  -z)2 

belongs  to  X  and  admits  a  unique  canonical  factorisation 

f{z)  =  f+(zWl{z), 
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where 


u+  .— 


1  -  Ez*+ 
1  —  z 


0  (1-E  z*-)z 

tJ_  := - 

1  —  z 


Here  we  do  not  discuss  the  case  n  >  0  since  it  reduces  to  the  case  <  0.  We 
will  also  not  present  an  analogue  of  Corollary  12.5.1  in  view  of  its  obviousness. 


Proof  of  Lemma  12.6.1  Let  E§  <  0.  Since 

(1  -p(z))z  _ 
1  —  z 


-E£  >0 


as  z  — >  1,  p(z)  is  continuous  on  the  compact  77  and,  furthermore,  \p(z)\  <  1  for 
1,  we  see  that  d(z)  is  bounded  away  from  zero  on  77  and  bounded,  and  hence 
belongs  to  X.  Further,  by  Corollary  12.2.2  (see  (12.2.1)  for  i X  =  z), 

t)(z)  =  ^  .Ez — — [l  -  E(zx+;  rj+  <  oo)], 


1  —  z 


where  Ex  G  (—00,  0) .  Therefore,  similarly  to  the  above,  we  get 

.  (1-E z*-)z  „ 

tJ-(z)  =  - - -  G  X. 

1  —  Z 

Moreover,  it  is  obvious  that  d_(z)  G  X-.  In  the  same  way  as  above,  we  obtain  that 

t>+(z)  =  1  -  E (zx+;  rj+<o o)  G  X+  H  X. 

This  proves  the  first  assertion  of  the  lemma. 

The  second  assertion  is  proved  similarly  by  using  Corollary  12.2.4,  by  which 


o 


t>°fe)  = 


1  —  Ezx+  (1-E  z*-)z 


1  —  z 


1  —  z 


Next,  as  before,  we  establish  that  d°  G  X  and  that  the  factors  on  the  right-hand  side, 
denoted  by  d^_(z),  belong  to  X±  n  X.  The  lemma  is  proved.  □ 


12.6.2  The  Classes  of  Distributions  on  the  Positive  Half -Line  with 
Rational  Generating  Functions 

The  content  of  Sect.  12.5.2  is  mostly  preserved  here.  Now  by  exponential  polyno¬ 
mials  we  mean  the  sequences 

k  ik 

px  =  'YP^akjXi~lqxk,  x  =  1, 2, , 
k=  1  7=1 


(12.6.1) 
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where  qf  <  1  are  different  (cf.  (12.5.10)).  To  probabilities  px  of  such  type  will 
correspond  rational  functions 


/>+(*)  =  E(z*;$>0)  = 

x=\ 


Pm(z) 

QnizV 


where  1  <  m  <n,n  =  J2k=i  4>  and,  for  definiteness,  we  put 

K 

Quiz)  =  Y\{\  -  quz)h .  (12.6.2) 

k=  1 


Here  a  significant  difference  from  the  non-lattice  case  is  that,  for  p+(z)  to  be 
rational,  we  do  not  need  (12.6.1)  to  be  valid/br  all  x  >  0.  It  is  sufficient  that  (12.6.1) 
holds  for  all  x,  starting  from  some  r  +  1  >  1 .  The  first  r  probabilities  p\ , . . . ,  pr  can 
be  arbitrary.  In  this  case  p+(z)  will  have  the  form 


P 


Pm(z) 

Qn(z) 


+  Tr(z)  = 


Pm(z) 
Quiz)  ’ 


(12.6.3) 


where  Tr  is  a  polynomial  of  degree  r  (for  r  =  0  we  put  To  =  0),  so  that  p+  is  again 
a  rational  function,  but  now  the  degree  of  the  polynomial  Pm 


M  = 


m, 

n  +  r, 


if  r  =  0, 
if  r  >  1 


(12.6.4) 


in  the  numerator  can  be  greater  than  the  degree  n  of  the  polynomial  in  the  denom¬ 
inator.  In  what  follows,  we  only  assume  that  n+r  >  0,  so  that  the  value  n  =0  is 
allowed  (in  this  case  there  will  be  no  exponential  part  in  (12.6.1)).  In  that  case  we 
will  assume  that  Qo  =  1  and  Pm  =  0.  The  distributions  corresponding  to  (12.6.3) 
will  also  be  called  exponential  polynomials. 


12.6.3  Explicit  Canonical  Factorisation  of  the  Function  t)(z)  in 
the  Case  when  the  Right  Tail  of  the  Distribution  F  Is  an 
Exponential  Polynomial 

Consider  an  arithmetic  distribution  F  on  the  whole  real  line  (— oo,  oo),  E§  <  0, 
which  is  an  exponential  polynomial  on  the  half-line  x  >  0.  As  before,  denote  the 
class  of  all  such  distributions  by  £  T.  The  generating  function  p(z)  of  the  distribution 
F  g  £CP  can  be  represented  as 

p(z)  =  P+(z)  +  P~(z), 


where  the  function 


p-(z)=E(A  §  <0) 
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is  analytic  in  77_  and  continuous  including  the  boundary  77,  and  p+(z)  is  a  rational 
function 


p+(z)  =  E(z^;  £  >  0) 


Pm(z) 

Quiz) 


analytic  in  77+ . 

As  above,  in  this  case  the  canonical  factorisation  of  the  function 


(1  -  p(z))z 
1  —  ■ 


can  be  found  in  explicit  form  in  terms  of  the  zeros  of  the  function  1  —  p(z). 


Theorem  12.6.1  Let  there  exist  E§  <  0.  For  the  positive  component  tt>+(z)  of  the 
canonical  factorisation 


t j(z)  =  ro+(z)ro_(z),  w±  e  X±  n  X, 

to  be  a  rational  function  it  is  necessary  and  sufficient  that  p+(z)  =  E (z^  ;  §>0)  is 
a  rational  function. 

If  p+  =  Pm/Qui  where  M  is  defined  in  (12.6.4),  is  an  uncancellable  ratio  of 
polynomials  then  the  function  1  —  p(z)  has  in  77 _  exactly  n  +  r  zeros ,  which  will  be 
denoted  by  z  i , . . . ,  zn+r ,  and 


tt)+(z)  = 


mtvzk-z) 

Qn  (') 


wftere  Qn(z.k)  ^  0. 

If  we  arrange  the  zeros  {zk}  according  to  the  values  of  \zu\  in  ascending  order, 
then  the  point  z\  >  1  is  a  simple  real  zero. 


The  theorem  implies  that 


(1  ~  P(Z))Z  Qniz ) 

(1  -z)  Y[nkt[(zk-z) 


By  Lemma  12.6.1,  from  Theorem  12.6.1  we  obtain  the  following  representation. 


Corollary  12.6.1  7/E§  <  0  and  p+  =  Pm/ Qn  then 

Ezs  =  m+(l)  =  Qn(z)nnkt[(zk-l) 
m+(z)  Qn{\)]\nkt\{zk  -  z) 

Similarly  to  (12.5.16),  we  can  also  write  down  the  explicit  form  of  E zx°-  and 
E (zx+\  ^/+  <  oo)  as  well. 


Proof  of  Theorem  12.6.1  The  proof  is  similar  to  that  of  Theorem  12.5.1. 

Sufficiency. 

1.  In  the  vicinity  of  the  point  z  =  1  in  77  the  value  of  —  t>(z)  lies  in  the  vicinity  of 
the  point  —  E§  >0.  Outside  a  neighbourhood  of  the  point  z  =  1  we  have  for  z  g  77, 
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1 


J  L  J  i 


arg(l  -  P(d)  £  (-f .  §).  »■*(+  j  =  " *+  ~  z)  €  V  2 

This  implies  that,  for  z  G  77, 

arg(-o(z))  e  (-JT,  jt), 

and  hence  the  trajectory  of  —  t>(z),  z  F  77,  never  intersects  the  ray  arg  t>  =  — 7r 

>2jr 


1  c 

ind  t>  := —  /  r/(argh(^^))  =  0. 

2?r  Jo 


2.  Represent  the  function  p  as  t>  =  tut>2,  where 


tJi(z)  := 


We  show  that 


z 


n+r 


Qn  (') 


t>2(z)  := 


Qn  (z)~  Pm  (z)  -  P  (z)  Qn  (z) 
(1  -z)zn+r~l 


ind  t>2  =  ~n  —  r. 


(12.6.5) 


In  order  to  do  this,  we  first  note  that  the  function  hi  is  analytic  in  77+  and  has  there 
a  zero  of  multiplicity  n  +  r.  Hence  by  the  argument  principle  ind  hi  =  n  +  r.  Since 
0  =  ind  h  =  ind  hi  +  ind  h2,  we  obtain  the  desired  relation. 

3.  We  show  that  1  —  p(z)  has  exactly  n  +  r  zeros  in  77_.  The  function  h2(z)  is 
analytic  on  77_  and  continuous  including  the  boundary  77.  The  positively  oriented 
contour  77,  which  contains  77+ ,  corresponds  to  the  negatively  oriented  contour  with 
respect  to  77_.  By  (12.6.5)  this  means  that  t>2(z)  has  precisely  n  +  r  zeros  on  77- 
while  the  point  z  =  o o  is  not  a  zero  since  the  numerator  and  the  denominator  of  h  (z) 
grow  as  \z\n+r  as  |z|  — >  oo. 

4.  Denote  the  zeros  of  h2  by  zi , . . . ,  zn+r  and  put 


to+(z)  := 


n  nkt[(zk-z) 

Qn(z) 


tt)-(z) 


Qn(z)(  1  ~  P(Z))Z 

(1  -z)  TVkti(zk-z) 


It  is  easy  to  see  that  tt>±  e  X±  H  X.  The  fact  that  Qn(zk)  ^  0  and  zi  is  a  simple  real 
zero  of  1  —  p(z)  is  proved  in  the  same  way  as  in  Theorem  12.5.1. 

Necessity  is  also  established  in  the  same  fashion  as  in  Theorem  12.5.1.  The  the¬ 
orem  is  proved.  □ 


Clearly,  in  the  arithmetic  case  we  have  complete  analogues  of  Examples  12.5.1 
and  12.5.2.  In  particular,  if 

P(£  =  k)  =  cqk~l ,  c  <  (1  —  q),  k  =  1,2,..., 

then 

zi  —  z 


tt>+(z)  = 


P  (S  =  0)  = 


1  -  qz 
1-zr1 


E+_  0  -gz)(zi  -  p 

(zi-z)(l-q)  ’ 


P  (S  =  k)  = 


(zt  1  —  q)(z\  -  \)z\ 


,  k>  1 


1 -q  1 -q 

In  contrast  to  Sect.  12.5,  here  one  can  give  another  example  where  the  distribution 
of  S  is  geometric. 
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Example  12.6.1  Let  P(§  =  1)  =  p\  >  0  and  P(§  >  2)  =  0.  In  this  case  x+  =  1  on 
the  set  {r/+  <  oo},  and  to  find  the  distribution  of  S  there  is  no  need  to  use  Theo¬ 
rem  12.6.1.  Indeed,  P (S  =  0)  =  1  —  p  =  P(?7+  =  oo).  If  rj+  <  oo  then  the  trajectory 
%n++ 1 »  %ri+= 2’  •  •  •  is  distributed  identically  to  ,  §2,  •  •  ■  and  hence 

0  with  probability  1  —  p, 

S  =  < 

X+  +  5(i)  with  probability  p, 


where  the  variable  5(i)  is  distributed  identically  to  S,  x+  =  1-  This  yields 


Ezs  =  (1  -  p)  +  pzEzs,  E zS  =  j — 

1  -  pz 

P(S  =  k)  =  (1  -  p)pk,  k  =  0,1,... 

By  virtue  of  identity  (12.3.3)  (for  elX  =  z)  the  point  z\  =  p~{  is  necessarily  a  zero 
of  the  function  1  —  p(z). 


12.6.4  Explicit  Canonical  Factorisation  of  the  Function  o(z)  when 
the  Left  Tail  of  the  Distribution  F  Is  an  Exponential 
Polynomial 


We  now  consider  the  case  where  the  distribution  F  on  the  negative  half-line  can  be 
represented  as  an  exponential  polynomial,  up  to  the  values  of  P(§  =  —  k)  at  finitely 
many  points  0,-1,  —2, . . . ,  —  r.  In  this  case,  the  value  of  p~  (z)  is  derived  similarly 
to  that  of  p+(z)  in  (12.6.3)  by  replacing  z  with  z_1 : 


p  (z)  =  E(z?;  %  <0) 


Zn~MPM(Z ) 
Qn(z) 


where  Qn  and  Pm  are  polynomials  (which  differ  from  (12.6.3)), 


M  = 


m, 

n  +  r, 


if  r  =  0, 
if  r  >  1 , 


and  all  <  1  are  distinct. 


K 


Qn  (z)  =  Y\(z-qic)lk, 

k= 1 


Theorem  12.6.2  Let  there  exist  <  0.  For  the  positive  component  of  the  canonical 

factorisation 


t )(z)  =  tD+(z)tD_(z) 


to  he  representable  as 

tt»+(z)  =  (1  -  p(z))R(z), 


where  R  (z)  is  a  rational  function ,  it  is  necessary  and  sufficient  that  p  (z)  is  a 
rational  function.  If 


P  (z)  = 


zn~MPM(z) 

Qn  (Z) 
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where  Pm  and  Qn  are  defined  in  (12.6.2)  and  (12.6.3),  then  the  function  1 
has  in  77+  exactly  n-\-r  —  1  zeros  that  we  denote  by  zi , . . . ,  zn+r-i,  and 


_ Qn  (Z) _ 

(!  ~z)  mT\z-zk) 


piz ) 


Proof  The  proof  is  very  close  to  that  of  Theorems  12.5.2  and  12.6.1.  Therefore  we 
will  only  present  a  brief  proof  of  sufficiency. 

1.  As  in  Theorem  12.6.1,  one  can  verify  that 

ind  t)  =  0. 


2.  Represent  h(z)ash  =  hit)2,  where 

(Qn(z)  -  zn~M Pm (z)  -  p+(z)Qn(z))zr 
01  := - 


,1  -r 


02  (z)  := 


(1-0  Qniz) 

The  function  t>2  is  analytic  in  77_,  continuous  including  the  boundary  77,  and  has  a 
zero  at  z  =  oo  of  multiplicity  n  +  r  —  1 ,  so  that 

ind  t>2  =  n  +  r  —  1 . 


The  function  t>  i  is  analytic  in  77+  and,  by  the  argument  principle,  has  there  n  +  r  —  1 
zeros  z\, . . . ,  zn+r- 1-  The  function  1  —  /?(z)  has  the  same  zeros. 

3.  By  putting 


to+(z)  := 


(1  -p(z))Qn(z) 

(1  -  z)UZt\~\z  -  zk) 


>o-(z) 


zUkti  l(z-Zk ) 

Qniz ) 


we  obtain  tv±  e  X±  fl  X.  The  theorem  is  proved. 


□ 


12.6.5  Explicit  Factorisation  of  the  Function  0 


By  virtue  of  the  remarks  at  the  beginning  of  Sect.  12.5.5  it  is  sufficient  to  consider 
factorisation  of  the  function 


t) 


o 


(1  -  p(z))z 
(1  -  z)2 


for  E§  =  0  and  Ef 2  <  oo  just  in  the  case  when  the  function 


p+(z)  =E(z?;  %  >  0) 


Pm(z) 
Qn  (Z) 


is  rational,  where  Qniz)  =  nf=i(l  “  <lkz)lk,n  =  J2k=i  h  (see  (12.6.2),  (12.6.3)). 


Theorem  12.6.3  Let  E§  =  0  and  Ef 2  =  a2  <  oo.  For  the  positive  component 
(z)  of  the  canonical  factorisation 

0  °(z)  =  W°(z)hJ°(z), 


w±  e  X±  n  X , 
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to  be  rational ,  it  is  necessary  and  sufficient  that  the  function  p+(z)  is  rational.  If 
/?+(z)  =  Pm(z)/ Qn(z),  where  M  is  defined  in  (12.6.4),  is  an  uncancellable  ratio  of 
polynomials  then  the  function  1  —  p(z)  has  in  FI-  exactly  n  +  r  —  l  zeros  that  we 
denote  by  zi,  •  •  • ,  zn+r- 1,  and 


ntyrfzk-z) 

Qn(z) 


w°(z) 


(i  -  pizyyQniz) 

(i  -z)2  Ytztrhzk-z) 


Corollaries  similar  to  Corollary  12.5.4  hold  true  here  as  well. 


Proof  of  Theorem  12.6.3  The  proof  is  similar  to  those  of  Theorems  12.5.3,  12.6.1 
and  12.6.2.  Therefore,  as  in  the  previous  theorem,  we  restrict  ourselves  to  the  key 
elements  of  the  proof  of  sufficiency. 

1.  In  the  vicinity  of  the  point  z  =  1  in  77,  the  value  of  —  t>°(z)  lies  in  the  vicinity 
of  the  point  a2/ 2  >  0.  Outside  of  a  neighbourhood  of  the  point  z  =  1,  for  z  e  77  we 
have 


arg(l  —  p(z))  e  Tj, 


Hence 


arg  Z  2  =~arg((l -z)(l  -  1  |  |  =  -arg|  2 


1  72jr 

ind  t>°  := —  /  <7(argt>°(e^))  =  0. 

2tt  Jo 

2.  Represent  the  function  t»°(z)  as 

t>°(z)  =0l(z)02(z), 


-  z  — 


where 


t>i(z)  := 


£n+r  —  1 

Qn(z)  ’ 


t>2  (z)  := 


Qn  -  Pm-  p  (z)  Qn 

(1  —  z)2zn+r~2 


As  before,  we  show  that  indt>i  =  n  +  r  —  1  and  that  1  —  p(z)  has,  on  77_,  exactly 
n  +  r  —  1  zeros,  which  are  denoted  by  zi , . . . ,  zn+r- 1  •  It  remains  to  put 


Quiz) 


i ro_(z) 


QnizK 1  -  p(z))z 

(i  -  zmnktr\zk  -  z) 


The  theorem  is  proved. 


□ 


12.7  Asymptotic  Properties  of  the  Distributions  of  x±  and  S 

We  saw  in  the  previous  sections  that  one  can  find  the  distributions  of  the  variables  S 
and  x±  in  explicit  form  only  in  some  special  cases.  Meanwhile,  in  applied  problems 
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of,  say,  risk  theory  (see  Sect.  12.4)  one  is  interested  in  the  values  of  P(S  >  x)  for 
large  v  (corresponding  to  small  ruin  probabilities).  In  this  connection  there  arises 
the  problem  on  the  asymptotic  behaviour  ofP(S>v)asv^oo,  as  well  as  related 
problems  on  the  asymptotics  of  P(lx±  I  >  x).  It  turns  out  that  these  problems  can  be 
solved  under  rather  broad  conditions. 


12.7.1  The  Asymptotics  of  P(x+  >  x  \  r)+  <  oo)  and  P(x_  <  —  *) 
in  the  Case  E£  <  0 

We  introduce  some  classes  of  functions  that  will  be  used  below. 


Definition  12.7.1  A  function  G(t )  is  called  (asymptotically)  locally  constant  (l.c.) 
if,  for  any  fixed  v. 


G(t  +  v) 
G(t) 


1  as  t  ->  oo. 


(12.7.1) 


It  is  not  hard  to  see  that,  say,  the  functions  G(t)  =  ta[ ln(l  +  t)]y ,  t  >  0,  are  l.c. 

We  denote  the  class  of  all  l.c.  functions  by  L.  The  properties  of  functions  from  L 
are  studied  in  Appendix  6.  In  particular,  it  is  established  that  (12.7.1)  holds  uni¬ 
formly  in  v  on  any  fixed  segment,  and  that  G(t)  =  e°^  and  G(t)  =  o(Gr (t))  as 
t  — >►  oo,  where 


G\t):= 


G(u)  du. 


(12.7.2) 


Denote  by  £  the  class  of  distributions  satisfying  the  right-hand  side  Cramer  con¬ 
dition  (the  exponential  class).  The  class  £*  c  £  of  distributions  G  whose  “tails” 
G(t)  =  G((f,oo))  satisfy,  for  any  fixed  v  >  0,  the  relation 


G{t  +  v) 
G(t ) 


(12.7.3) 


could  be  called  the  “superexponential”  class.  For  example,  the  normal  distribution 
belongs  to  £*.  In  the  arithmetic  case,  one  has  to  put  v  =  1  in  (12.7.3)  and  consider 
integer- valued  t. 

In  the  case  E£  <  0  it  is  convenient  to  introduce  a  random  variable  x  with  the 
distribution 


P(X  e  dv)  =  P(x+  edv\r]+  <  oo)  = 


P(X+  e  dv\  77+  <  oo) 

p 


p  =  P(r]+  <  oo). 


If  E§  =0  then  the  distributions  of  x  and  x+  coincide.  In  the  sequel  we  will  confine 
ourselves  to  non-lattice  §  (then  x±  will  also  be  non-lattice).  In  the  arithmetic  case 
everything  will  look  quite  similar. 

Denote  by  F+(t)  the  right  “tail”  of  the  distribution  F:  F+(f)  :=  F((f,  oo))  and 
put 


F+(u)du. 
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Theorem  12.7.1  Let  there  exist  E§  <  0  and ,  in  the  case  E§  =  0,  assume  E§2  <  oo 
holds. 


1.  7/‘jF+(/l)  =  o(F+(t))  as  t  — >  oo  as  x  ^  oo 

+ 


P(x  >  x)  ~  - 


pEX° 


2.  //F+(0  =  V(t)e~^,  fi>0,V  eL  then 


P(X  >*) 


E+(x) 


p(l  -Ee^-) 


3.  7/F+  e  £*  fften 


P(X  >*) 


E+(x) 


pP(X-  <  0) 


(12.7.4) 


(12.7.5) 


(12.7.6) 


Proof  The  proof  is  based  on  identity  (12.2.1)  of  Corollary  12.2.2,  which  can  be 
rewritten  as 

1  -  pEeikx  =  1  ~  ^(A)  ,  <p°_(X)  :=Eeikx~.  (12.7.7) 

1  ~<P-(0 

Introduce  the  renewal  function  H-  (, t )  corresponding  to  the  random  variable  x^  <  0: 


oo 

H-(t)  =  J2P(Hk  -  t}-  Hk  =  X{.l) +  ■■■  +  X-\ 
k= 0 

where  x~^  are  independent  copies  of  •=  Ex^  >  — oo.  As  was  noted  in 

Sect.  10.1,  the  function  1/(1  —  (p°_(L))  can  be  represented  as 


1 

1  -<p°-fr) 


dH-(t) 


(the  function  H-(t )  decreases).  Therefore,  for  v  >  0  and  any  N  >  0,  we  obtain  from 
(12.7.7)  that 


-l 


0 


pP(x>x)  =  -  d H- (t)  F+(x  —  t)  =  — 


OO 


/0  n  —N 

-N  J-o o 


(12.7.8) 


Here,  by  the  condition  of  assertion  1 

'0 


-  [  <  F+(x)[H-(-N)  -  ff_(0)]  =  o(FI+{x))  as  x  ^  oo. 

J-N 

Evidently,  this  relation  will  still  be  true  when  N  — >►  oo  slowly  enough  as  v  — >►  oo. 
Furthermore,  by  the  local  renewal  theorem,  as  N  — >  oo, 


/ 


—N 


dH-(t)  F+(x  -  t) 


oo 


/ 


-N  dt 

F+(x  - 1 ) 


Fl{x  +  N) 


OO 


\a- 


\a- 


(12.7.9) 


For  a  formal  justification  of  this  relation,  the  interval  (— oo,  —N]  should  be  divided 
into  small  intervals  (— A^+i,  —Nk],  k  =  0,  1, . . . ,  No  =  N,  Nk+i  >  Nk,  on  each  of 
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which  we  use  the  local  renewal  theorem,  so  that 


F+(x  -  Nk)(Nk+ 1  -  Nk) 


\a. 


(l+o(D)<-  f 


-Nk 


< 


dH-(t)F+(x-t) 

~Nk+i 

F+(x  —  Nk+\)(Nk+i  —  Nk) 


\a- 


(l  +0(1)). 


From  here  it  is  not  difficult  to  obtain  the  required  bounds  for  the  left-hand  side 
of  (12.7.9)  that  are  asymptotically  equivalent  to  the  right-hand  side.  Since,  for  N 
growing  slowly  enough, 


Fhx)  -  Fhx  +  N)  = 


fx-\-N 

/  F+(m) du  <  F+(x)N  = 

J  x 


one  has  F|(x  +  N)  ~  F|(x),  and  we  finally  obtain  the  relation 


+ 


pP(X  >  x) 


Fl(x  +  N) 


\a- 


This  proves  (12.7.4). 

If  F+(0  =  V {t)e~^ ,  V  e  £,  then  we  find  from  (12.7.8)  that 

rO 

pP(X  >  x)  ~  —  V(x)e~Px 


f  dH-(t)etp  =  - 

J —OO  1 


F+(x) 


-  Ee^x-  ’ 


This  proves  (12.7.5). 

Now  let  F+  e  £*.  If  we  denote  by  ho  >  0  the  jump  of  the  function  H-(t)  at  the 
point  0  then,  clearly, 

■o 


L 


F+{x  —  t) 

dH-(t) - >  ho  as  x  ^  oo, 

oo  F-\-  (x ) 


and  hence 


P P(X  >x)  ~  F+(x)h0. 

If  we  put  q  :=P(x^  =0)  then  ho,  being  the  average  time  spent  by  the  random 
walk  { Hfc }  at  the  point  0,  equals 


k= o 


1 

i  -q' 


The  theorem  is  proved. 


□ 


Now  consider  the  asymptotics  of  P(x_  <  —  x)  as  x  —>  oo. 
Put  F-(t)  :=  F((— oo,  —  t))  =  P(§  <  —t). 


Theorem  12.7.2  Let  E§  <  0. 


1.  If  F-  e  L  then ,  as  x  — >  oo, 


<  — x) 


F-(x) 

1  ~P  ’ 
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2.  IfF-(t)=e-YtV(t),V(t)eL,then 


P(X_  <  -x) 


3.  IfF _  e  £*  f/ien 


p(x 


0 


<  —x)  ~ 


E  e~ysF_(x) 
1  -/? 


F_(x)P(S  =  0) 


1  -/? 


Proof  Making  use  of  identity  (12.3.3): 

(l-<p(k))EeiXS 


1-/(A)  = 


1  -J9 


<P 


°_(X)=EeiXx~ 


This  implies  that  P(x^  <  —  x)  is  the  weighted  mean  of  the  value  F-{x  + 1)  with  the 
weight  function  P (S  e  dt)/(  1  —  p ): 

P(x°  <  —  x)  =  - -  /  P(S  e  dt )  F-(x  +  t). 


\  1  f 

;  1-pJo 

From  here  the  assertions  of  the  theorem  follow  in  an  obvious  way. 


□ 


If  E§  =0  then  the  asymptotics  of  P(x_  <  —  x)  will  be  different. 


12.7.2  The  Asymptotics  ofP(S  >  x ) 

We  will  study  the  asymptotics  of  P (S  >  x)  in  the  two  non-overlapping  and  mutually 
complementary  cases  where  F+  e  £  (the  Cramer  condition  holds)  and  where  F+ 
belongs  to  the  class  S  of  subexponential  functions. 

Definition  12.7.2  A  distribution  G  on  [0,  oo)  with  the  tail  G(t)  :=  G([C  oo))  be¬ 
longs  to  the  class  S+  of  sub  exponential  distributions  on  the  positive  half -line  if 

G2*(t)  ~  2G(t)  as  f  ^  oo.  (12.7.10) 

A  distribution  G  on  the  whole  real  line  belongs  to  the  class  §  of  sub  exponential 
distributions  if  the  distribution  G+  of  the  positive  part  f  +  =  max{0,  f }  of  a  random 
variable  f  ^  G  belongs  to  S+.  A  random  variable  is  called  sub  exponential  if  its 
distribution  is  subexponential. 

As  we  will  see  later  (see  Theorem  A6.4.3  in  Appendix  6),  the  subexponentiality 
distribution  G  is  in  essence  a  property  of  the  asymptotics  of  the  tail  of  G(t)  as 
t  — ^  oo.  Therefore  we  can  also  talk  about  sub  exponential  functions.  A  nonincreasing 
function  G\(t)  on  (0,  oo)  is  called  subexponential  if  the  distribution  G  with  a  tail 
G(t)  such  that  G(t)  ~  cG\{t)  as  t  — >  oo  for  some  c  >  0  is  subexponential.  (For 
example,  distributions  with  tails  G\(t)/G\(0)  or  min(l,  G\(t))  if  Gi(0)  >  1.) 

The  properties  of  subexponential  distributions  are  studied  in  Appendix  6.  In  par¬ 
ticular,  it  is  established  that  S  C  £ ,  01  C  S  (ft  is  the  class  of  regularly  varying  func¬ 
tions)  and  that  G(t)  =  o(G! (t))  if  G1  e  §. 
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Theorem  12.7.3  If  F^(t)  G  S  and  a  —  <  0,  then ,  as  x  — >  oo, 


P (S  >  x) 


1 


\a 


F+(x) 


(12.7.11) 


Proof  Making  use  of  the  identity  from  Theorem  12.3.2: 


Eeixs  = 


1-/7 


1  -  p(px  (X)  ’ 


cpx(X)  :=EeiXx, 


(12.7.12) 


it  follows  that 


oo 


Eeas  =  (l-p)J2pk<PkAO, 


k= 0 


and  hence,  for  v  >  0, 


oo 


V{S>x)  =  {\-p)YJPknHk>x),  Hk:=J2xj ,  (12.7.13) 

k= 1  7=1 

where  Xj  are  independent  copies  of  x-  By  assertion  1  of  Theorem  12.7.1  the  distri¬ 
bution  of  x  is  subexponential,  while  by  Theorem  A6.4.3  of  Appendix  6,  as  v  — >  oo, 
for  each  fixed  k  one  has 


P (Hk  >  x)  ^kP(x  >x). 


(12.7.14) 


Moreover,  again  by  Theorem  A6.4.3  of  Appendix  6,  for  any  s  >  0,  there  exists  a 
b  =  b(s)  such  that,  for  all  v  and  k  >  2, 

P (Hk  >  x) 


P(X  >x) 

Therefore,  for  (1  +  s)p  <  1,  the  series 


<b(  1  +  e)\ 


oo 


Ep 

k= 1 


kV(Hk>x) 

P(X  >*) 


converges  uniformly  in  v.  Passing  to  the  limit  as  v  ^  oo,  by  virtue  of  (12.7.14)  we 
obtain  that 


oo 


x^oo  P(x  >  x) 

or,  which  is  the  same,  that 


lim  >X^  =  (1  ~  P)^kpk  =  - 

k=  l 


P 


P 


o/e  \  PP(X  >  x) 

P (S  >  x)  ~  — -  as  v  — >►  oo, 


1-/7 


where,  by  Theorem  12.7.1, 


F[(x) 


o 


P  Ex_ 


P(X  >  x) 
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Since,  by  Corollary  12.2.3, 

(l-p)Ex_=E$, 

we  obtain  (12.7.1 1).  The  theorem  is  proved.  □ 


Now  consider  the  case  when  F  satisfies  the  right-hand  side  Cramer  condition 
(F+  g  £).  For  definiteness,  we  will  again  assume  that  the  distribution  F  is  non¬ 
lattice.  Furthermore,  we  will  assume  that  there  exists  an  fi\  >0  such  that 

fill i):=Ee^  =  l,  b:=E$e^  =  <oo.  (12.7.15) 


In  this  case  the  Cramer  transform  of  the  distribution  of  F  at  the  point  fi\  will  be  of 
the  form 


F  (fn)(dt) 


e^’F  (dt) 

if(m) 


e^F  (dt). 


(12.7.16) 


A  random  variable  with  the  distribution  F(Ml)  has,  by  (12.7.15),  a  finite  expec¬ 
tation  equal  to  b.  Denote  the  size  of  the  first  overshoot  of  the  level  r  by  a  random 
walk  with  jumps  §(M1)  by  xqi  oOO-  By  Corollary  10.4.1,  the  distribution  of  X( 
converges,  as  v  ->  oo,  to  the  limiting  distribution:  X(/xi)0O  =>  X(/xi)>  so  that 


(12.7.17) 


Theorem  12.7.4  Let  F+  g  £  and  (12.7.15)  be  satisfied.  Then ,  as  x  —>  oo, 

P (S  >  x)  ^  ce~^x ,  (12.7.18) 

where  c  =  <  1. 


There  is  a  somewhat  different  interpretation  of  the  constant  c  in  Remark  15.2.3. 
Exact  upper  and  lower  bounds  for  e^lXP(S  >  x)  are  contained  in  Theorem  15.3.5. 

Note  that  the  finiteness  of  E§  <0  is  not  assumed  in  Theorem  12.7.4.  In  the 
arithmetic  case,  we  have  to  consider  only  integer  v . 


Proof  Put  r](x) 
Then 


:=  min{n  >  1  :  Sn  >  v},  Xn  :=x\-\ - h xn  and  Xn  :=  max^<n  X^, 


oo 


P(S  >  x)  =  P(?7(v)  <  oo)  =  ^~^P(t7(v)  =  n ), 


(12.7.19) 


77  =  1 


where 

P  (ri(x)  =  n) 


-fj 


¥(dx\) . .  .F(dxn)I(Xn-i  <x,  Xn>x) 


n 


-ff  F(IXl)(dxi)...Fi/Xl)(dxn 


)e  ^lXn  I(X„_i  <  x,  X„  >  x) 


n 


=  E(w)e  ^lS"l(ri(x)  =n). 


(12.7.20) 
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Here  E (Ml)  denotes  the  expectation  when  taken  assuming  that  the  distribution  of  the 
summands  §*■  is  F^).  By  the  convexity  of  the  function  t/t(/x)  =  Ee^ , 

E(w)§  =  j xe^xE (dx)  =  if' (ii i)  =  b  >  0, 

and  hence 

P(Ml)  <  °°)  =  1- 

Therefore,  returning  to  (12.7.19),  we  obtain 

oo 

PCS  >  x)  =  E(w)  I()?(x)  =  n)=  (12.7.21) 

k=  1 

where  5^(jc)  =  x  +  X(/xi)M  and,  by  (12.7.17), 

£?^*P(S  >  jc)  ->  c  =  Ee~^lX^  <  1. 

This  proves  (12.7.18).  For  arithmetic  §  the  proof  is  the  same.  We  only  have  to  re¬ 
place  E(dt)  in  (12.7.15)  and  (12.7.16)  by  pk  =  P(§  =  k),  as  well  as  integration  by 
summation.  The  theorem  is  proved.  □ 

Corollary  12.7.1  If \  in  the  arithmetic  case ,  <  0,  p\  =  P(§  =  1)  >  0,  P(§  >  2)  = 

0  f/zew  f/ze  conditions  of  Theorem  12.7.4  are  satisfied  and  one  has 

F(S>  x)=e~IHik+l\  k  >  0. 


Proof  The  proof  follows  immediately  from  (12.7.21)  if  we  note  that,  in  the  case 
under  consideration,  X(m)(x)  —  1  and  ^(jc)  =  x  + 1 .  This  assertion  repeats  the  result 
of  Example  12.6.1.  □ 


Remark  12.7.1  The  asymptotics  (12.7.18),  obtained  by  a  probabilistic  argument, 
admits  a  simple  analytic  interpretation.  From  (12.7.18)  it  follows  that,  as  /x  \  /xi, 
we  have 

Ee'lS  ~  CM  l  . 

/Xl  —  /X 


But  that  E^M^  has  precisely  this  form  follows  from  identity  (12.3.3): 


(l-rt(l-E^) 

i-VKm) 

Indeed,  since,  by  assumption,  i/a(/x)  =  Ee^  is  left-differentiable  at  the  point  /xi  and 


=  1  —  b(/ji\  —  /x)  +  o(fpi\  —  /x)),  (12.7.22) 


one  has 


Ee^s 


(1  -p)(\  -  Ee^*-) 


b(pi\  —  /x) 


(12.7.23) 
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as  11 1  /jl\.  This  implies,  in  particular,  yet  another  representation  for  the  constant  c 
in  (12.7.18): 

(1  -  p)(l  -Ee^x-) 


Since 

Ee“s  = 

ro+(A) 

and  tr>+(A)  has  a  zero  at  the  point  fi i,  we  can  obtain  representations  similar  to 
(12.7.22)  and  (12.7.23)  in  terms  of  the  values  of  tt>+(0)  and 

We  should  also  note  that  the  proof  of  asymptotics  (12.7.18)  with  the  help  of 
relations  of  the  form  (12.7.23)  is  based  on  certain  facts  from  mathematical  analysis 
and  is  relatively  simple  only  under  the  additional  condition  (12.5.1). 

There  are  other  ways  to  prove  (12.7.18),  but  they  also  involve  additional  restric¬ 
tions.  For  instance,  (12.3.3)  implies 

oo 

Eeixs  =  (1  -  p)  y][/'(A.)  -  <pk(k) EeiXx~], 

k=0 

oo 

p (s  >x)  =  (\-p)  y][P(V  >  x)  -  P(Sjfc  +  x-  >  *)] 

k= 0 

no o  °° 

=  (1  -  p)  P(x_  e  dt)  T,  P (Sk  e  (x,  x  +  f]), 

Jo  k= o 

and  the  problem  now  reduces  to  integro-local  theorems  for  large  deviations  of  Sk 
(see  Chap.  9)  or  to  local  theorems  for  the  renewal  function  in  the  region  where  the 
function  converges  to  zero. 


12.7.3  The  Distribution  of  the  Maximal  Values  of  Generalised 
Renewal  Processes 


Let  {(r;,  be  a  sequence  of  independent  identically  distributed  random  vec¬ 

tors, 


Z(t)  -  Zy(t), 


where 


k 

v(t)  :=  max{k  :  Tk  <  t},  Tk  :=  r;-. 

7  =  1 


In  Sect.  12.4.3  we  reduced  the  problem  of  finding  the  distribution  of  supr  ( Z(t )  —  qt ) 

to  that  of  the  distribution  of  S  :=  sup/c>0  Sk,  Sk  :=  5Zy=i  §/  :=  0  —  (irj 
case  q  >0,  >0.  We  show  that  such  a  reduction  takes  place  in  the  general  case 

as  well.  If  q  >  0  and  the  can  take  values  of  both  signs,  then  the  reduction  is  the 
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same  as  in  Sect.  12.4.3.  Now  if  q  <  0  then 


sup(Zy(?)  -  qt)  =  sup(-#7i,  Zi  -  qT2 ,  Z2  -  qT3, . . .) 

t 

=  -qx  i  +  sup[Z£_i  -  q(Tk  -  n)]  =  S-qr, 

k>  1 

where  the  random  variables  r\  and  S  are  independent. 


12.8  On  the  Distribution  of  the  First  Passage  Time 
12.8.1  The  Properties  of  the  Distributions  of  the  Times  rj± 

In  this  section  we  will  establish  a  number  of  relations  between  the  random  vari¬ 
ables  q±  and  the  time  6  when  the  global  maximum  S  =  sup  Sk  is  attained  for  the 
first  time: 


6  :=  min{k  :  Sk  =  S)  (if  S  <  oo  a.s.). 


Put 


oo 


P(z)  :=  ^2zkP(q°_  >  k),  q(z)  :=  E(zri+\rj+  <  oo), 


^=0 


oo 


D+  :=E 


p (St  >  0) 


fc=l 


Further,  let  q  be  a  random  variable  with  the  distribution 


P(?7  =  k)  =  P(?7+  =  |  ?7+  <  oo) 


(and  the  generating  function  q(z)),  rji,  r\i, . . .  be  independent  copies  of  77, 

•=  *71  +  ’  ’  ’  +  Vk,  Ho  =  0> 

and  v  be  a  random  variable  independent  of  {77/J  with  the  geometric  distribution 
P(v  =  k)  =  (1  —  p) pk ,  k  >  0. 

Theorem  12.8.1  If  p  =  P(^+  <  00)  <  1  f/zcn 


1. 

2. 

3. 


1  ~  P  = 


1 


P(z)  = 


Eq 

1 


0 


=  e 


D 


+ 


E  z 


0 


P (j]°_  >  nj  =  (1  —  p)P(Hv  =  n)  >  P (q+  —  n) 


1  -  pq(z)  1  -  p 


(12.8.1) 

(12.8.2) 

(12.8.3) 


for  all  n  >  0. 


Recall  that,  for  the  condition  p  <  1  to  hold,  it  is  sufficient  that  E§  <  0  (see 
Corollary  12.2.6). 
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The  second  assertion  of  the  theorem  implies  that  the  distributions  of  77 °_ ,  77+  and  6 
uniquely  determine  each  other,  so  that  if  at  least  one  of  them  is  known  then,  to  find 
the  other  two,  it  is  not  necessary  to  know  the  original  distribution  F.  In  particular, 

P(0=n)  =  (l-p)P(X°  >n). 


Proof  of  Theorem  12.8.1  The  arguments  in  this  subsection  are  based  on  the  follow¬ 
ing  identities  which  follow  from  Theorems  12.1.1-12.1.3  if  we  put  there  X  =  0  and 
Id  <  1: 


Since 


1  —  z  =  [l  —  Ed?-][l  -  E (z??+;  77+  <  00)], 


1  —  E  zv~  =  exp' 
77+  <  00)  =  exp 


(12.8.4) 

(12.8.5) 

(12.8.6) 


1 


=  P(z), 


P(l)=Er]0_ 


we  obtain  from  (12.8.4)  the  first  equalities  in  (12.8.1)  and  (12.8.2).  The  second 
equality  in  (12.8.1)  follows  from  (12.8.6). 

To  prove  the  second  equality  in  (12.8.2),  we  make  use  of  the  relation 


0 

T]  +  +6>* 


on  {00  : 77  +  =  00}, 
on  {co  :  77+  <  00}, 


where  0*  is  distributed  on  {77+  <  00}  identically  to  0  and  does  not  depend  on  77+.  It 
follows  that 


E z°  =  (1  —  p)  +  EzeE(zr]+ ;  77+  <  00) . 

This  implies  the  second  equality  in  (12.8.2).  The  last  assertion  of  the  theorem  fol¬ 
lows  from  the  first  equality  in  (12.8.2),  which  implies 


00  00  00 

P(z)  =  ^2pkqk(z)  =  (1  -  p)^P(v  =  k)^¥{Hk  =  n)zn 

k=0  k= 0  n= 0 

00 

=  {l-p)YiZttP(Hv=n). 

/?=() 

The  theorem  is  proved.  □ 


The  second  equality  in  (12.8.2)  and  identity  (12.7.12)  mean  that  the  representa¬ 
tions 

0  =  771  +  •  •  •  +  7?y  and  S  =  xi  +  •  •  *  +  Xv, 

respectively,  hold  true,  where  v  has  the  geometric  distribution  P(v  =  k)  = 
(1  -  p)pk,  k>  0,  and  does  not  depend  on  {rjj},  {Xj}- 
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Note  that  the  probabilities  P(S&  >  0)  =  P(*S^  —  ak  >  — ak )  on  the  right-hand 
sides  of  (12.8.5)  and  (12.8.6)  are,  for  large  k  and  a  =  E§  <0,  the  probabilities  of 
large  deviations  that  were  studied  in  Chap.  9.  The  results  of  that  chapter  on  the 
asymptotics  of  these  probabilities  together  with  relations  (12.8.5)  and  (12.8.6)  give 
us  an  opportunity  to  find  the  asymptotics  of  P(r/+  =  n)  and  P(rj°_  =  n)  as  n  ->  oo 
(see  [8]). 

Now  consider  the  case  where  the  both  random  variables  rj°_  and  r]+  are  proper. 
That  is  always  the  case  if  E§  =0  (see  Corollary  12.2.6).  Here  identities  (12.8.4)- 
(12.8.6)  hold  true  (with  P(^+  <  oo)  =  1).  As  before,  (12.8.4)  implies  that  the  dis¬ 
tributions  of  r]°_  and  r]  +  uniquely  determine  each  other. 

Let  77i ,  772, ...  be  independent  copies  of  77+,  =  vj\  4 - f  r]k  and  Ho  =  0.  For 

the  sums  /4,  define  the  local  renewal  function 

00 

h„  :=^P (Hk  =  n). 

n= 0 


Theorem  12.8.2  IfP(r]°_  <  00)  =  P(t7+  <  00)  =  1  then: 

1.  E?7°.  =  E77+  =  00. 

2.  P(?7^  >  n)  =  hn. 


Proof  From  (12.8.4)  it  follows  that 


P(z)  = 


1 


1  —  z 


1 

1  -  E 


00 


(12.8.7) 


as  z  — >  1.  Since  P(z)  — >  E/70.  as  z  ^  1,  we  have  proved  that  Et/°_  is  infinite.  That 
E?7+  is  also  infinite  is  shown  in  the  same  way.  The  second  assertion  also  follows 
from  (12.8.7)  since  the  right-hand  side  of  (12.8.7)  is  YlnL oZnhn.  The  theorem  is 
proved.  □ 


Now  we  turn  to  the  important  class  of  symmetric  distributions.  We  will  say  that 
the  distribution  of  a  random  variable  §  is  symmetric  if  it  coincides  with  the  distribu¬ 
tion  of  — §,  and  will  call  the  distribution  of  i=  continuous  if  the  distribution  function 
of  §  is  continuous.  For  such  random  variables,  E§  =  0  (if  E§  exists),  the  distribu¬ 
tions  of  Sn  are  also  symmetric  continuous  for  all  n,  and 

P (Sn  >  0)  =  P (Sn  <  0)  =  1  P (Sn  =  0)  =  0, 

and  hence  D(z)  =  1,  P(x+  =  0)  =  0,  and  77 +  =  x+  =  X+  with  probability  1. 


Theorem  12.8.3  If  the  distribution  of  %  is  symmetric  and  continuous  then 

0  x  (2  n)\  1 


P(?7+  =  n)  =  P(?7_  =  n)  = 


P(/w  >  0)  =  P(£„  <  0) 


(2tz  -  l)(n\)222n  2+Jn  n2/2  ’ 

1 


(12.8.8) 


<sfjxn 


as  n  — >  oo  ()/w  and  defined  in  Section  12.1.3). 
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Proof  Since  E z11-  =  E zr]+,  by  virtue  of  (12.8.4)  one  has 

1  -  Ez"+  =  Vl -z. 

Expanding  Vl  —  z  into  a  series,  we  obtain  the  second  equality  in  (12.8.8).  The 
asymptotic  equivalence  follows  from  Stirling’s  formula. 

The  second  assertion  of  the  theorem  follows  from  the  first  one  and  the  equality 

oo 

P(V  <0)=  p('?+  =  fe)- 

k=n+ 1 

The  assertions  concerning  ij°_  and  yn  follow  by  symmetry. 

The  theorem  is  proved.  □ 

Note  that,  under  the  conditions  of  Theorem  12.8.3,  the  distributions  of  the  vari¬ 
ables  77+,  t/_,  yn,  t,n  do  not  depend  on  the  distribution  of  §.  Also  note  that  the 
asymptotics 

P(,'+=",~2^2 

persists  in  the  case  of  non-symmetric  distributions  as  well  provided  that  E§  =  0  and 
E§2  <  00  (see  [8]). 


12.8.2  The  Distribution  of  the  First  Passage  Time  of  an  Arbitrary 
Level  x  by  Arithmetic  Skip-Free  Walks 

The  main  object  in  this  section  is  the  time 

rj(x)  =  min{k  :  Sk  >  x] 

of  the  first  passage  of  the  level  x  by  the  random  walk  {Sk}.  Below  we  will  consider 
the  class  of  arithmetic  random  walks  for  which  /+  =  1 . 

By  an  arithmetic  skip-free  walk  we  will  call  a  sequence  {Sk}^ 0»  where  the  dis¬ 
tribution  of  §  is  arithmetic  and  ma xw§((y)  =  1  (i.e.  p  1  >  0  and  pk  =  0  for  k  >  2, 
where  pk  =  P(§  =  k)).  The  term  “skip-free  walk”  appears  due  to  the  fact  that  the 
walk  {Sfc},£  =  0,l,...,  cannot  skip  any  integer  level  x  >  0:  if  Sn  >  x  then  neces¬ 
sarily  there  is  a  k  <  n  such  that  Sk=x. 

As  we  already  know  from  Example  12.6.1,  for  skip-free  walks  with  E§  <  0  the 
distribution  of  S  is  geometric: 

P(S  =  k)  =  (l-  p)pk,  k  =  0,1,..., 

where  p  =  P(»?+  <  oo)  and  z  1  =  p~l  is  the  zero  of  the  function  1  —  p(z)  with 
P(z)  =  E/t  PkZk. 

It  turns  out  that  one  can  find  many  other  explicit  formulas  for  skip-free 
walks.  In  this  section  we  will  be  interested  in  the  distribution  of  the  maximum 
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Sn  =  max(0,  S\, . . . ,  Sn)\  as  we  already  noted,  knowing  the  distribution  is  impor¬ 
tant  for  many  problems  of  mathematical  statistics,  queueing  theory,  etc.  Note  that 
finding  the  distribution  of  Sn  is  the  same  as  finding  the  distribution  of  q(x),  since 

{Sn  <  x}  =  [r](x)  >  n].  (12.8.9) 

Here  we  put  ij(x)  :=  oo  if  S  <  x. 

The  Pollaczek-Spitzer  identity  (see  Theorem  12.3.1)  provides  the  double  trans¬ 
form  of  the  distribution  of  Sn .  Analysing  this  identity  shows  that  the  distribution  of 
Sn  (or  ij(x))  itself  typically  cannot  be  expressed  in  terms  of  the  distribution  of  ^  in 
explicit  form.  However,  for  discrete  skip-free  walks  one  has  remarkable  “duality” 
relations  which  we  will  now  prove  with  the  help  of  Pollaczek-Spitzer’ s  identity. 

Theorem  12.8.4  If  §  is  integer-valued  then  P(§£  >  2)  =  0  is  a  necessary  and  suffi¬ 
cient  condition  for 

nP{q(x)  =  n)  =  xP(Sn  =  x),  x  >  1.  (12.8.10) 

Using  the  Wald  identity,  it  is  also  not  hard  to  verify  that  if  the  expectation  E§i  = 
a  >  0  exists  then  the  walk  {5^}  will  be  skip-free  if  and  only  if  Er/(x)  =  x /a.  (Note 
that  the  definition  of  r\  (v)  in  this  section  somewhat  differs  from  that  in  Chap.  10. 
One  obtains  it  by  changing  v  to  v  +  1  on  the  right-hand  side  of  the  definition  of  ij(x) 
from  Chap.  10.) 

The  asymptotics  of  the  local  probabilities  P (Sn  =x)  was  studied  in  Chap.  9  (see 
e.g.,  Theorem  9.3.4).  This  together  with  (12.8.10)  enables  us  to  find  the  asymptotics 
of  P(r](x)  =  n). 


Proof  of  Theorem  12.8.4  Set 

rx  :=  P (rj(x)  =  oo)  =  P(S  <  x),  qx,n  ■=  V(r}(x)  =  n ), 

OO 

Qx,tt  ■=  P(rj(x)  >  n)  =  <lx,k  +  rx. 

k=n+ 1 

Since  for  each  y,  0  <  y  <  x, 

n 

{v(x)  =n)  C  CJ { ?7 (y)  =T 

k= 0 

using  the  fact  that  the  walk  is  skip-free,  by  the  total  probability  formula  one  has 


n 


4x,n  —  ^  ^ 4y,k4x—y,n—ki 


k= 0 


where  qo  o=  l,  and  qy$  =  0  for  y  >0.  Hence  for  |z|  <  1  using  convolution  we  have 


oo 

qx(z)  :=^2qx,nZn  = 

k= 0 


E(z??(x);  rj(x)  <  oo)  =qy(z)qx-y(z). 
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Putting  y  =  1  and  qo(z)  =  1,  we  obtain 

qx(z)=q(z)qx-i(z)=qx(z),  x>0. 


From  here  one  can  find  the  generating  function  Qx(z )  of  the  sequence  Qxn 


oo 


oo 


CO 


n  —  1 


Qx(z):=^2zn[rx+  Y  qx,k\  =  yzt +  ^2qx,nYz> 


n— 0 

r 

-  z 


k=n+ 1 


oo 


i  *  +  !>-»  i 


1  —  z 


n 


x 


n  =  1 


—  z  1  —  Z 


+ 


n  =  1  A=0 

-qx(z)  1  -qx(z) 


1  —  z 


1  -  z 


Note  that  here  the  quantity  qx(  1)  =  P(r/(jc)  <  oo)  =  P (S  >  x )  can  be  less  than  1 
Using  (12.8.9)  we  obtain  that 

P (Sn  =x)  =  P(rj(x  +  1)  >  n)  -  P(rj(x)  >  n), 

(1  -  qx+l(z))  -  (1  -  qx(z))  qx(z)(  1  -  q(z)) 


oo 


y]znp(5„=x)  = 


/?  =0 


1  —  z 


1  —  z 


Finally,  making  use  of  the  absolute  summability  of  the  series  below,  we  find  that, 
for  ltd  <  1  and  \z\  <  1, 


oo 


oo  oo 


YznVvs"  =  £zBp(S„  =*)  =  7 73 


l  -q{z) 


n  n  O  (1  -z)(l  -  vq(z)) 

Turning  now  to  the  Pollaczek-Spitzer  formula,  we  can  write  that 

V  —  Ei'max^0’S'l)  =  In  _  in(i  _  vq{z))  =  in  UiU  +  U 

,  /r  1  —  z  1  —  z  , 

/? = l  x=i 

Comparing  the  coefficients  of  vx ,  v  >  1 ,  we  obtain 


(vq(z))x 


x 


oo 


n 


Y-P(Sn=X)=q-^,  x>l 
“  n  x 

n = 1 


(12.8.11) 


Taking  into  account  that  qx(z)  =  qx(z)  and  comparing  the  coefficients  of  zn,  n  >  1, 
in  (12.8.11)  we  get 

1  1 

-P (Sn  =x)  =  -P(rjx  =n),  x  >  1,  n  >  1. 

Sufficiency  is  proved. 

The  necessity  of  the  condition  P(§  >  2)  =  0  follows  from  equality  (12.8.10)  for 
x  =  n  —  1 : 


oo 


oo 


Pi  =q\A  =  ^Pk,  ^2pk=P(%  >  2)  =0. 


fc=l 


&=2 


The  theorem  is  proved. 


□ 
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Using  the  obtained  formulas  one  can,  for  instance,  find  in  Example  4.2.3  the 
distribution  of  the  time  to  ruin  in  a  game  with  an  infinitely  rich  adversary  (the  total 
capital  being  infinite).  If  the  initial  capital  of  the  first  player  is  v  then,  for  the  time 
rj(x)  of  his  ruin,  we  obtain 

p  {n(x)  =n)  =  -P  (Sn  =  x), 

n 

where 

n 

s»  =  y>;  P  (£;  =  1)  =  ?,  P(^-  =  -l)  =  p 

7  =  1 

(p  is  the  probability  for  the  first  player  to  win  in  a  single  play).  Therefore,  if  n  and 
v  are  both  either  odd  or  even  then 

P(i?(*)  =n)  =  X~({n  -x)/2)  q(n+x)/2pin-x)/2,  (12.8.12) 

and  P(r](x)  =  n)  =  0  otherwise. 

It  is  interesting  to  ask  how  fast  P(r](x)  >  n)  decreases  as  n  grows  in  the  case 
when  the  player  will  be  ruined  with  probability  1,  i.e.  when  P(r](x)  <  oo)  =  1.  As 
we  already  know,  this  happens  if  and  only  if  p  <  q .  (The  assertion  also  follows  from 
the  results  of  Sect.  13.3.) 

Applying  Stirling’s  formula,  as  was  done  when  proving  the  local  limit  theorem 
for  the  Bernoulli  scheme,  it  is  not  difficult  to  obtain  from  (12.8.12)  that,  for  each 
fixed  v,  as  n  ->  oo  (n  and  v  having  the  same  parity),  for  p  <  q. 


and 


P (j](x)  =  nj  ~ 
P (r/(x) 


x 


n3/2(p  —  q )2 


p  (j]{x) 


for  p  =  q . 


The  last  relation  allowed  us,  under  the  conditions  of  Sect.  8.8,  to  obtain  the  lim¬ 
iting  distribution  for  the  number  of  intersections  of  the  trajectory  S\, . . . ,  Sn  with 
the  strip  [ u ,  v ]  (see  (8.8.24)).  Up  to  the  normalising  constants,  this  assertion  also 
remains  true  for  arbitrary  random  walks  such  that  E§£  =  0  and  <  oo.  However, 
even  in  the  case  of  a  skip-free  walk,  the  proof  of  this  assertion  requires  additional 
efforts,  despite  the  fact  that,  for  such  walks,  an  upward  intersection  of  the  line  v  =  0 
by  the  trajectory  {Sn }  divides  the  trajectory,  as  in  Sect.  8.8,  into  independent  identi¬ 
cally  distributed  cycles. 


Chapter  13 

Sequences  of  Dependent  Trials.  Markov  Chains 


Abstract  The  chapter  opens  with  in  Sect.  13.1  presenting  the  key  definitions  and 
first  examples  of  countable  Markov  chains.  The  section  also  contains  the  classifica¬ 
tion  of  states  of  the  chain.  Section  13.2  contains  necessary  and  sufficient  conditions 
for  recurrence  of  states,  the  Solidarity  Theorem  for  irreducible  Markov  chains  and 
a  theorem  on  the  structure  of  a  periodic  Markov  chain.  Key  theorems  on  random 
walks  on  lattices  are  presented  in  Sect.  13.3,  along  with  those  for  a  general  sym¬ 
metric  random  walk  on  the  real  line.  The  ergodic  theorem  for  general  countable 
homogeneous  chains  is  established  in  Sect.  13.4,  along  with  its  special  case  for  fi¬ 
nite  Markov  chains  and  the  Law  of  Large  Numbers  and  the  Central  Limit  Theorem 
for  the  number  of  visits  to  a  given  state.  This  is  followed  by  a  short  Sect.  13.5  de¬ 
tailing  the  behaviour  of  transition  probabilities  for  reducible  chains.  The  last  three 
sections  are  devoted  to  Markov  chains  with  arbitrary  state  spaces.  First  the  ergod- 
icity  of  such  chains  possessing  a  positive  atom  is  proved  in  Sect.  13.6,  then  the 
concept  of  Harris  Markov  chains  is  introduced  and  conditions  of  ergodicity  of  such 
chains  are  established  in  Sect.  13.7.  Finally,  the  Laws  of  Large  Numbers  and  the 
Central  Limit  Theorem  for  sums  of  random  variables  defined  on  a  Markov  chain  are 
obtained  in  Sect.  13.8. 


13.1  Countable  Markov  Chains.  Definitions  and  Examples. 
Classification  of  States 

13.1.1  Definition  and  Examples 

So  far  we  have  studied  sequences  of  independent  trials.  Now  we  will  consider  the 
simplest  variant  of  a  sequence  of  dependent  trials. 

Let  G  be  an  experiment  having  a  finite  or  countable  set  of  outcomes  {E\ ,  E2, . .  .}. 
Suppose  we  keep  repeating  the  experiment  G.  Denote  by  Xn  the  number  of  the 
outcome  of  the  n- th  experiment. 

In  general,  the  probabilities  of  different  values  of  Exn  can  depend  on  what  events 
occurred  in  the  previous  n  —  1  trials.  If  this  probability,  given  a  fixed  outcome  Exn_x 
of  the  ( n  —  l) -st  trial ,  does  not  depend  on  the  outcomes  of  the  preceding  n  —  2  trials, 
then  one  says  that  this  sequence  of  trials  forms  a  Markov  chain. 
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To  give  a  precise  definition  of  a  Markov  chain,  consider  a  sequence  of  integer¬ 
valued  random  variables  {A„}^0.  If  the  w-th  trial  resulted  in  outcome  Ej ,  we  set 
Xn  •—  j  • 

Definition  13.1.1  A  sequence  {Aw}^°  forms  a  Markov  chain  if 

F{Xn  —  j  \X0  =  ko,  X  i  —  k\ , Xn—2  —  kn— 2,  Aw_i  =  /) 
P(X„=7|X„_i=i)=:^"}.  (13.1.1) 

These  are  the  so-called  countable  (or  discrete)  Markov  chains,  i.e.  Markov  chains 
with  countable  state  spaces. 


Thus,  a  Markov  chain  may  be  thought  of  as  a  system  with  possible  states 
{E\,  E2, . . .}.  Some  “initial”  distribution  of  the  variable  Ao  is  given: 


P  (Xo  =  j)  =  p°j,  Xp(j  =  L 


Next,  at  integer  time  epochs  the  system  changes  its  state,  the  conditional  probability 
of  being  at  state  Ej  at  time  n  given  the  previous  history  of  the  system  only  being 
dependent  on  the  state  of  the  system  at  time  n  —  1 .  One  can  briefly  characterise  this 
property  as  follows:  given  the  present,  the  future  and  the  past  of  the  sequence  Xn 
are  independent. 

For  example,  the  branching  process  {t;n}  described  in  Sect.  7.7,  where  was  the 
number  of  particles  in  the  n- th  generation,  is  a  Markov  chain  with  possible  states 
{0,1,2,...}. 

In  terms  of  conditional  expectations  or  conditional  probabilities  (see  Sect.  4.8), 
the  Markov  property  (as  we  shall  call  property  (13.1.1))  can  also  be  written  as 


P(Xn  =  j  I  <r(X0,  •  •  • .  *„_i))  =  P{Xn  |  <r(X„_i)), 


where  erf)  is  the  o -algebra  generated  by  random  variables  appearing  in  the  argu¬ 
ment,  or,  which  is  the  same, 


P (Xn=j  X0,...,Xn-i)=F(Xn\Xn-i). 


This  definition  allows  immediate  extension  to  the  case  of  a  Markov  chain  with  a 
more  general  state  space  (see  Sects.  13.6  and  13.7). 

The  problem  of  the  existence  of  a  sequence  {A„}^°  which  is  a  Markov  chain 

with  given  transition  probabilities  /?•”  (pff  >  0,  ■  pff  =  1)  and  a  given  “initial” 

distribution  {p®}  of  the  variable  Ao  can  be  solved  in  the  same  way  as  for  independent 
random  variables.  It  suffices  to  apply  the  Kolmogorov  theorem  (see  Appendix  2)  and 
specify  consistent  joint  distributions  by 

P(X0  =  ko,Xi=ki,...,Xn  =  kn)  :=  '  •  • p£. 

which  are  easily  seen  to  satisfy  the  Markov  property  (13.1.1). 
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Definition  13.1.2  A  Markov  chain  {An}g°  is  said  to  be  homogeneous  if  the  proba¬ 
bilities  do  not  depend  on  n. 

We  consider  several  examples. 

Example  13.1.1  (Walks  with  absorption  and  reflection)  Let  a  >  1  be  an  integer. 
Consider  a  walk  of  a  particle  over  integers  between  0  and  a.  If  0  <  k  <  a,  then  from 
the  point  k  with  probabilities  1/2  the  particle  goes  to  k  —  1  or  k  +  1 .  If  k  is  equal  to  0 
or  a ,  then  the  particle  remains  at  the  point  k  with  probability  1.  This  is  the  so-called 
walk  with  absorption.  If  Xn  is  a  random  variable  which  is  equal  to  the  coordinate 
of  the  particle  at  time  n ,  then  the  sequence  {Xn}  forms  a  Markov  chain,  since  the 
conditional  expectation  of  the  random  variable  Xn  given  Xo,  X\, . . . ,  Xn-\  depends 
only  on  the  value  of  Xn-\ .  It  is  easy  to  see  that  this  chain  is  homogeneous. 

This  walk  can  be  used  to  describe  a  fair  game  (see  Example  4.2.3)  in  the  case 
when  the  total  capital  of  both  gamblers  equals  a.  Reaching  the  point  a  means  the 
ruin  of  the  second  gambler. 

On  the  other  hand,  if  the  particle  goes  from  the  point  0  to  the  point  1  with  prob¬ 
ability  1 ,  and  from  the  point  a  to  the  point  a  —  1  with  probability  1 ,  then  we  have  a 
walk  with  reflection.  It  is  clear  that  in  this  case  the  positions  Xn  of  the  particle  also 
form  a  homogeneous  Markov  chain. 

Example  13.1.2  Let  {^}^0  a  secluence  °f  independent  integer- valued  random 
variables  and  d  >  0  be  an  integer.  The  random  variables  Xn  :=  J2k=o&  (mod  d ) 
obtained  by  adding  ^  modulo  d  (Xn  =  Jfk= —  where  j  is  such  that  0  < 
Xn  <  d)  form  a  Markov  chain.  Indeed,  we  have  Xn  =  Xn-\  +  (mod  d ),  and 
therefore  the  conditional  distribution  of  Xn  given  X\,  X2, . . . ,  Xn-\  depends  only 
on  Xn-\. 

If,  in  addition,  {^}  are  identically  distributed,  then  this  chain  is  homogeneous. 

Of  course,  all  the  aforesaid  also  holds  when  d  =  00,  i.e.  for  the  conventional 
summation.  The  only  difference  is  that  the  set  of  possible  states  of  the  system  is  in 
this  case  infinite. 

From  the  definition  of  a  homogeneous  Markov  chain  it  follows  that  the  probabil- 
ities  p\j  of  transition  from  state  £)■  to  state  Ej  on  the  n- th  step  do  not  depend  on  n. 
Denote  these  probabilities  by  pij.  They  form  the  transition  matrix  P  =  ||  pij  ||  with 
the  properties 

pu  >  °-  E  p'i  = 1  - 

j 

The  second  property  is  a  consequence  of  the  fact  that  the  system,  upon  leaving  the 
state  Ei ,  enters  with  probability  1  one  of  the  states  E\ ,  E2, . . . . 

Matrices  with  the  above  properties  are  said  to  be  stochastic. 

The  matrix  P  completely  describes  the  law  of  change  of  the  state  of  the  system 
after  one  step.  Now  consider  the  change  of  the  state  of  the  system  after  k  steps.  We 
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introduce  the  notation  pij(k )  :=  P(Xk  =  j\Xo  =  i).  For  k  >  1,  the  total  probability 
formula  yields 


Pij(k)  =  ^P(W-i  =s\X0  =  i)psj  =  Pis (k  -  1  )psj. 

s  s 

Summation  here  is  carried  out  over  all  states.  If  we  denote  by  P(k)  :=  ||  ptjik)  ||  the 
matrix  of  transition  probabilities  Pij(k ),  then  the  above  equality  means  that  P(k )  = 
P(k  —  l)P  or,  which  is  the  same,  that  P(k)  =  Pk.  Thus  the  matrix  P  uniquely 
determines  transition  probabilities  for  any  number  of  steps.  It  should  be  added  here 
that,  for  a  homogeneous  chain, 


nxn+k  =  j\Xn  =  i )  =  P(Xk  =  j\X0  =  i )  =  Pijik). 


We  see  from  the  aforesaid  that  the  “distribution”  of  a  chain  will  be  completely  de¬ 
termined  by  the  matrix  P  and  the  initial  distribution  p®  =  P(Xo  =  k). 

We  leave  it  to  the  reader  as  an  exercise  to  verify  that,  for  an  arbitrary  k  >  1  and 
sets  ,  Bfi—k, 

P(W?  —  j  \Xn~k  =  i\  Xn—k—\  G  B i, . . . ,  G  Bn-k)  =  pij{k). 

To  prove  this  relation  one  can  first  verify  it  for  k  =  1  and  then  make  use  of  induction. 

It  is  obvious  that  a  sequence  of  independent  integer- valued  identically  distributed 
random  variables  Xn  forms  a  Markov  chain  with  pij  =  pj  =  P(Xn  =  j).  Here  one 
has  P{k)  =  Pk  =  P. 


13.1.2  Classification  of  States 

Definition  13.1.3 

Kl.  A  state  E\  is  called  inessential  if  there  exist  a  state  Ej  and  an  integer  to  >  0 
such  that  pij  (to)  >  0  and  pji(t)  =  0  for  every  integer  t. 

Otherwise  the  state  Ei  is  called  essential. 

K2.  Essential  states  Ei  and  Ej  are  called  communicating  if  there  exist  such  integers 
t  >  0  and  s  >  0  that  Pij(t)  >  0  and  pji(s)  >  0. 

Example  13.1.3  Assume  a  system  can  be  in  one  of  the  four  states  {E\,  E2,  E2,  £4} 
and  has  the  transition  matrix 


0 

1/2 

1/2 

°  \ 

1/2 

0 

0 

1/2 

0 

0 

1/2 

1/2 

V  0 

0 

1/2 

1/2  / 

1 


Here  and  in  Sect.  12.2  we  shall  essentially  follow  the  paper  by  A.N.  Kolmogorov  [23]. 
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Fig.  13.1  Possible  transitions 
and  their  probabilities  in 
Example  13.1.3 


In  Fig.  13.1  the  states  are  depicted  by  dots,  transitions  from  state  to  state  by 
arrows,  numbers  being  the  corresponding  probabilities.  In  this  chain,  the  states  E\ 
and  E2  are  inessential  while  £3  and  £4  are  essential  and  communicating. 

In  the  walk  with  absorption  described  in  Example  13.1.1,  the  states  1,2,..., 
a  —  1  are  inessential.  The  states  0  and  a  are  essential  but  non-communicating,  and  it 
is  natural  to  call  them  absorbing.  In  the  walk  with  reflection,  all  states  are  essential 
and  communicating. 

Let  be  a  homogeneous  Markov  chain.  We  distinguish  the  class  S°  of 

all  inessential  states.  Let  £/  be  an  essential  state.  Denote  by  Se;  the  class  of  states 
comprising  £z  and  all  states  communicating  with  it.  If  Ej  e  SEt ,  then  Ej  is  essential 
and  communicating  with  £/,  and  £/  e  Sej-  Hence  Sei  =  Sej .  Thus,  the  whole  set 
of  essential  states  can  be  decomposed  into  disjoint  classes  of  communicating  states 
which  will  be  denoted  by  Sl ,  S2, . . . 

Definition  13.1.4  If  the  class  S Et  consists  of  the  single  state  £z  ,  then  this  state  is 
called  absorbing. 

It  is  clear  that  after  a  system  has  hit  an  essential  state  £z,  it  can  never  leave 
the  class  Sei  . 

Definition  13.1.5  A  Markov  chain  consisting  of  a  single  class  of  essential  com¬ 
municating  states  is  said  to  be  irreducible.  A  Markov  chain  is  called  reducible  if  it 
contains  more  than  one  such  class. 

If  we  enumerate  states  so  that  the  states  from  S°  come  first,  next  come  states 
from  Sl  and  so  on,  then  the  matrix  of  transition  probabilities  will  have  the  form 
shown  in  Fig.  13.2.  Here  the  submatrices  marked  by  zeros  have  only  zero  entries. 
The  cross-hatched  submatrices  are  stochastic. 

Each  such  submatrix  corresponds  to  some  irreducible  chain.  If,  at  some  time,  the 
system  is  at  a  state  of  such  an  irreducible  chain,  then  the  system  will  never  leave  this 
chain  in  the  future.  Hence,  to  study  the  dynamics  of  an  arbitrary  Markov  chain,  it 
is  sufficient  to  study  the  dynamics  of  irreducible  chains.  Therefore  one  of  the  basic 
objects  of  study  in  the  theory  of  Markov  chains  is  irreducible  Markov  chains.  We 
will  consider  them  now. 


394 


13  Sequences  of  Dependent  Trials.  Markov  Chains 


Fig.  13.2  The  structure  of 
the  matrix  of  transition 
probabilities  of  a  general 
Markov  chain.  The  class  S° 
consists  of  all  inessential 
states,  whereas  S[ ,  S2, . . .  are 
closed  classes  of 
communicating  states 


We  introduce  the  following  notation: 


oo 


fj(n)  :=  P(Xn  =  j,  ^j,...,Xl^j\X0  =  j ),  Fj  :=  £  /,(«); 


n= 1 


fj  ( n )  is  the  probability  that  the  system  leaving  the  j-th  state  will  return  to  it  for  the 
first  time  after  n  steps.  The  probability  that  the  system  leaving  the  j-th  state  will 
eventually  return  to  it  is  equal  to  Fj . 

Definition  13.1.6 

K3.  A  state  Ej  is  said  to  be  recurrent  (or  persistent)  if  Fj  =  1,  and  transient  if 
Fj  <  1. 

K4.  A  state  Ej  is  called  null  if  Pjj(n)  ->  0  as  n  — >  oo,  and  positive  otherwise. 

K5.  A  state  Ej  is  called  periodic  with  period  dj  if  the  recurrence  with  this  state  has 
a  positive  probability  only  when  the  number  of  steps  is  a  multiple  of  dj  >  1 , 
and  dj  is  the  maximum  number  having  such  property. 

In  other  words,  dj  >  1  is  the  greatest  common  divisor  (g.c.d.)  of  the  set  of  num¬ 
bers  {n  :  fj(n)  >  0}.  Note  that  one  can  always  choose  from  this  set  a  finite  subset 
{n\ , . . . ,  nk]  such  that  dj  is  the  greatest  common  divisor  of  these  numbers.  It  is  also 
clear  that  pjj  (n)  =  fj  (n)  =  0  if  n  0  (mod  dj). 

Example  13.1.4  Consider  a  walk  of  a  particle  over  integer  points  on  the  real  line 
defined  as  follows.  The  particle  either  takes  one  step  to  the  right  or  remains  on 
the  spot  with  probabilities  1/2.  Here  /)( 1)  =  1/2,  and  if  n  >  1  then  fj(n)  =  0  for 
any  point  j .  Therefore  Fj  <  1  and  all  the  states  are  transient.  It  is  easily  seen  that 
Pjj  (n)  =  1/2”  — >  0  as  n  — >  oo  and  hence  every  state  is  null. 

On  the  other  hand,  if  the  particle  jumps  to  the  right  with  probability  1  /2  and  with 
the  same  probability  jumps  to  the  left,  then  we  have  a  chain  with  period  2,  since 
recurrence  to  any  particular  state  is  only  possible  in  an  even  number  of  steps. 
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13.2  Necessary  and  Sufficient  Conditions  for  Recurrence  of 
States.  Types  of  States  in  an  Irreducible  Chain.  The 
Structure  of  a  Periodic  Chain 


Recall  that  the  function 

oo 

a(z)  =  y 1anZn 

n= 0 

is  called  the  generating  function  of  the  sequence  {an}^L^  Here  z  is  a  complex  vari¬ 
able.  If  the  sequence  {an}  is  bounded,  then  this  series  converges  for  |z|  <  1. 

Theorem  13.2.1  A  state  Ej  is  recurrent  if  and  only  if  Pj 
a  transient  Ej , 


1  7  - 

'  1  +  Pj 

The  assertion  of  this  theorem  is  a  kind  of  expansion  of  the  Borel-Cantelli  lemma 
to  the  case  of  dependent  events  An  =  {Xn  =  j}.  With  probability  1  there  occur 
infinitely  many  events  An  if  and  only  if 

oo 

y>(A„)  = />,.  =  oo. 

n= 1 


=  1  Pjjin)  =  For 

(13.2.1) 


Proof  By  the  total  probability  formula  we  have 


Pjjin)  =  fj(\)pjj(n  -  1)  +  fj(2)pjj(n  —  2)  H - +  fj(n  -  l)pjj(l)  +  fj(n)  ■  1. 

Introduce  the  generating  functions  of  the  sequences  {Pjj(ri)}°°_n  and  {fj(n)}°°_  0: 

OO  OO 

Pj  Pjj ’  Fi (z)  :=  J2  fi (w)z"  • 

n= 1  n=l 


Both  series  converge  inside  the  unit  circle  and  represent  analytic  functions.  The 
above  formula  for  pjjin ),  after  multiplying  both  sides  by  zn  and  summing  up  over 
n,  leads  (by  the  rule  of  convolution)  to  the  equality 

P j (z)  =  z/i(l)(l  +  P j (z))  +  z2/i(2)(l  +  P j (z))  H - =  (l  +  P j (z))F j (z). 


Thus 


Pj(z) 

1  +  Pj(z)' 


Fj(z) 
i  +  Fj(Zy 


Assume  that  Pj  =  oo.  Then  Pj (z)  — >  oo  as  z  t  1  and  therefore  Fj (z)  — >  1.  Since 
Fj(z)  <  Fj  for  real  z  <  1 ,  we  have  Fj  =  1  and  hence  Ej  is  recurrent. 
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Now  suppose  that  Fj  =  1.  Then  Fj(z)  — >  1  as  z  t  1»  and  so  ^/(z)  — >  oo.  There¬ 
fore  Pj(z)  =  oo. 

If  Ej  is  transient,  it  follows  from  the  above  that  Pj(z)  <  oo,  and  setting  z  :=  1 
we  obtain  equality  (13.2.1).  □ 

The  quantity  Pj  =  Pjj  (n)  can  t>e  interpreted  as  the  mean  number  of  visits 
to  the  state  Ej ,  provided  that  the  initial  state  is  also  Ej .  It  follows  from  the  fact  that 
the  number  of  visits  to  the  state  Ej  can  be  represented  as  I (Xn  =  y),  where, 

as  before,  1(A)  is  the  indicator  of  the  event  A.  Therefore  the  expectation  of  this 
number  is  equal  to 

oo  oo  oo 

e  1  (X«  =  i)  =  E/  (*»  =  i)  =  J2  Pjj  («)  =  Pj  ■ 

n= 1  n  =  1  n= 1 

Theorem  13.2.1  implies  the  following  result. 

Corollary  13.2.1  A  transient  state  is  always  null. 

This  is  obvious,  since  it  immediately  follows  from  the  convergence  of  the  series 
J2Pjj(n)  <  oo  that  Pjj(n)  — >  0. 

Thus,  based  on  definitions  K3-K5,  we  could  distinguish,  in  an  irreducible  chain, 
8  possible  types  of  states  (each  of  the  three  properties  can  either  be  present  or  not). 
But  in  reality  there  are  only  6  possible  types  since  transient  states  are  automatically 
null,  and  positive  states  are  recurrent.  These  six  types  are  generated  by: 

1)  Classification  by  the  asymptotic  properties  of  the  probabilities  Pjj(n)  (tran¬ 
sient,  recurrent  null  and  positive  states). 

2)  Classification  by  the  arithmetic  properties  of  the  probabilities  Pjj(n)  or  fj(n) 
(periodic  or  aperiodic). 

Theorem  13.2.2  (Solidarity  Theorem)  In  an  irreducible  homogeneous  Markov 
chain  all  states  are  of  the  same  type :  if  one  is  recurrent  then  all  are  recurrent ,  if 
one  is  null  then  all  are  null ,  if  one  state  is  periodic  with  period  d  then  all  states  are 
periodic  with  the  same  period  d. 

Proof  Let  Ek  and  Ej  be  two  different  states.  There  exist  numbers  N  and  M  such 
that 


Pkj(N)  >  0,  pjk(M)  >  0. 

The  total  probability  formula 

Pkk(N  +  M  +  n)  =  y2Pkl(N)Ph(n)Psk(M) 

l,S 


implies  the  inequality 

Pkk(N  +  M  +  n)>  pkj(N)pjj(n)pjk(M)  =  aPpjj(n). 
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Here  n  >  0  is  an  arbitrary  integer,  a  =  Pjj(N)  >  0,  and  ft  =  pjj(M)  >  0.  In  the 
same  way  one  can  obtain  the  inequality 

Pjj(N  +  M  +  n)  >  afipkk(n). 


Hence 


1 

—pkk(N  +  M  +  n)  >  pkk(n)  >  ufSpkk(n  -  M  -  N).  (13.2.2) 

ap 

We  see  from  these  inequalities  that  the  asymptotic  properties  of  Pkkip)  and 
Pjj(n)  are  the  same.  If  Ek  is  null,  then  pkkfi)  — >  0,  therefore  Pjj(n)  ->  0  and 
Ej  is  also  null.  If  Ek  is  recurrent  or,  which  is  equivalent,  Pk  =  Pkk(jt)  =  oo, 
then 

oo  oo 

Ej  Pjj(n)>aP  E  Pkk(n  -  M  -  N)  =  oo, 

n=M+N+\  n=M+N+ 1 

and  Ej  is  also  recurrent. 

Suppose  now  that  Ek  is  a  periodic  state  with  period  dk.  If  Pkk(n)  >  0,  then  dk 
divides  n.  We  will  write  this  as  dk  \  n.  Since  Pkk(M  +  N)  >  aft  >  0,  then  dk  \ 

c M  +  N ). 

We  now  show  that  the  state  Ej  is  also  periodic  and  its  period  dj  is  equal  to  dk . 
Indeed,  if  Pjj(n )  >  0  for  some  n ,  then  by  virtue  of  (13.2.2),  pkkip  +  M  +  N)  >  0. 
Therefore  dk  \  (n  +  M  +  N),  and  since  dk  \  (M  +  N),  dk  \n  and  hence  dk  <  dj.  In 
a  similar  way  one  can  prove  that  dj  <  dk.  Thus  dj  =dk.  □ 

If  the  states  of  an  irreducible  Markov  chain  are  periodic  with  period  d  >  1 ,  then 
the  chain  is  called  periodic. 

We  will  now  show  that  the  study  of  periodic  chains  can  essentially  be  reduced  to 
the  study  of  aperiodic  chains. 

Theorem  13.2.3  If  a  Markov  chain  is  periodic  with  period  d ,  then  the  set  of  states 
can  be  split  into  d  subclasses  Wo,  ^i , . . . ,  ^d-i  such  that ,  with  probability  1,  in  one 
step  the  system  passes  from  ^  to  xI/k+ 1,  and  from  Wd-i  the  system  passes  to  ^o- 

Proof  Choose  some  state,  say,  E\.  Based  on  this  we  will  construct  the  subclasses 
^0,^1,...,  ^d-  i  in  the  following  way:  E[  e  tf'a,  0  <  a  <  d  —  1,  if  there  exists  an 
integer  k  >  0  such  that  pu  ( kd  +  a)  >0. 

We  show  that  no  state  can  belong  to  two  subclasses  simultaneously.  To  this  end 
it  suffices  to  prove  that  if  Ei  e  ^  and  pu  (s)  >  0  for  some  s,  then  s  =  a  (mod  d). 

Indeed,  there  exists  a  number  1 1  >  0  such  that  pn(t\)  >  0.  So,  by  the  definition 
of  tf'a,  we  have  p\\(kd  +  a  +  t\)  >0.  Moreover,  pn(s  +  t\)  >0.  Hence  d  \  ( kd  + 
a  +  t\)  and  d  \  (s  +  t\).  This  implies  a  =  s  (mod  d). 

Since  starting  from  the  state  E\  it  is  possible  with  positive  probability  to  enter 
any  state  Ei ,  the  union  (Ja  contains  all  the  states. 
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Fig.  13.3  The  structure  of 
the  matrix  of  transition 
probabilities  of  a  periodic 
Markov  chain:  an  illustration 
to  the  proof  of 
Theorem  13.2.3 


We  now  prove  that  in  one  step  the  system  goes  from  with  probability  1  to 
^+1  (here  the  sum  a  +  1  is  modulo  d).  We  have  to  show  that,  for  Ei  e  tf'a, 

J2  Pij  =  L 

Eje^a+i 

To  do  this,  it  suffices  to  prove  that  pij  =  0  when  Ei  e  Ej  £  &a+i. 
If  we  assume  the  opposite  ( pij  >  0)  then,  taking  into  account  the  inequality 
pu  ( kd  +  a)  >  0,  we  have  p\j(kd  +  a  +  1)  >  0  and  consequently  Ej  e  tfVn.  This 
contradiction  completes  the  proof  of  the  theorem.  □ 


We  see  from  the  theorem  that  the  matrix  of  a  periodic  chain  has  the  form  shown 
in  Fig.  13.3  where  non-zero  entries  can  only  be  in  the  shaded  cells. 

From  a  periodic  Markov  chain  with  period  d  one  can  construct  d  new  Markov 
chains.  The  states  from  the  subset  ^Ea  will  be  the  states  of  the  a-th  chain.  Transition 
probabilities  are  given  by 


Pij  id). 


By  virtue  of  Theorem  13.2.3,  p ?.  =  1.  The  new  chains,  to  which  one  can 

J  ^  1 J 

reduce  in  a  certain  sense  the  original  one,  will  have  no  subclasses. 


13.3  Theorems  on  Random  Walks  on  a  Lattice 

1.  A  random  walk  on  integer  points  on  the  line.  Imagine  a  particle  moving  on 
integer  points  of  the  real  line.  Transitions  from  one  point  to  another  occur  in  equal 
time  intervals.  In  one  step,  from  point  k  the  particle  goes  with  a  positive  probability 
p  to  the  point  k  +  1 ,  and  with  positive  probability  q  =  1  —  p  it  moves  to  the  point 
k  —  1.  As  was  already  mentioned,  to  this  physical  system  there  corresponds  the 
following  Markov  chain: 


Xn  —  Xn—1  +  Hn  —  *0  +  Sn, 

where  takes  values  1  and  —  1  with  probabilities  p  and  q ,  respectively,  and  Sn  = 

J2k=  \  The  states  of  the  chain  are  integer  points  on  the  line. 
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It  is  easy  to  see  that  returning  to  a  given  point  with  a  positive  probability  is  only 
possible  after  an  even  number  of  steps,  and  /o(2)  =  2 pq  >0.  Therefore  this  chain 
is  periodic  with  period  2. 

We  now  establish  conditions  under  which  the  random  walk  forms  a  recurrent 
chain. 

Theorem  13.3.1  The  random  walk  {Xn }  forms  a  recurrent  Markov  chain  if  and  only 

if  P  =  q  =  1/2- 


Proof  Since  0  <  p  <  1,  the  random  walk  is  an  irreducible  Markov  chain.  Therefore 
by  Theorem  13.2.2  it  suffices  to  examine  the  type  of  any  given  point,  for  example, 
zero. 

We  will  make  use  of  Theorem  13.2.1.  In  order  to  do  this,  we  have  to  investigate 
the  convergence  of  the  series  poo(n)-  Since  our  chain  is  periodic  with  period 
2,  one  has  poo(2k  +  1)  =  0.  So  it  remains  to  compute  poo(2k).  The  sum  Sn  is 
the  coordinate  of  the  walking  particle  after  n  steps  (Xo  =  0).  Therefore  poo(2k)  = 
V(S2k  =  0).  The  equality  S2k  =  0  holds  if  k  of  the  random  variables  are  equal 
to  1  and  the  other  k  are  equal  to  —1  (k  steps  to  the  right  and  k  steps  to  the  left). 
Therefore,  by  Theorem  5.2.1, 


P (S2k  =  0)  - 


2kH{\/2) 


(4  pq)k. 


We  now  elucidate  the  behaviour  of  the  function  /3(p)  =  4pq  =  4p(l  —  p)  on  the 
interval  [0,  1].  At  the  point  p  —  1/2  the  function  f>(p )  attains  its  only  extremum, 
fi(  1/2)  =  1.  At  all  the  other  points  of  [0,  1],  /3(p)  <  1.  Therefore  4 pq  <  1  for 
p  ^  1/2,  which  implies  convergence  of  the  series  J2kL\  POo(2k)  and  hence  the  tran¬ 
sience  of  the  Markov  chain.  But  if  p  =  1/2  then  poo(2k)  ~  l/Vnk  and  the  series 
YlkLi  Poo(2k)  diverges,  which  implies,  in  turn,  that  all  the  states  of  the  chain  are 
recurrent.  The  theorem  is  proved.  □ 


Theorem  13.3.1  allows  us  to  make  the  following  remark.  If  p  1/2,  then  the 
mean  number  of  recurrences  to  0  is  finite,  as  it  is  equal  to  YlkLi  POO (2k).  This 
means  that,  after  a  certain  time,  the  particle  will  never  return  to  zero.  The  particle 
will  “drift”  to  the  right  or  to  the  left  depending  on  whether  p  is  greater  than  1  /  2  or 
less.  This  can  easily  be  obtained  from  the  law  of  large  numbers. 

If  p  =  1/2,  then  the  mean  number  of  recurrences  to  0  is  infinite;  the  particle 
has  no  “drift”.  It  is  interesting  to  note  that  the  increase  in  the  mean  number  of  re¬ 
currences  is  not  proportional  to  the  number  of  steps.  Indeed,  the  mean  number  of 
recurrences  over  the  first  2 n  steps  is  equal  to  Jfk= l  POO  (2k).  From  the  proof  of  The¬ 
orem  13.3.1  we  know  that  poo (2k)  ~  1  /Vrtk.  Therefore,  as  n  — >►  oo, 


n 


n 


^2  poo  (2k)  ~  ^ 


1 


k= l 


\[fik 
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Thus,  in  the  fair  game  considered  in  Example  4.2.2,  the  proportion  of  ties  rapidly 
decreases  as  the  number  of  steps  increases,  and  deviations  are  growing  both  in  mag¬ 
nitude  and  duration. 


13.3.1  Symmetric  Random  Walks  in  M  ,  k  >2 

Consider  the  following  random  walk  model  in  the  k -dimensional  Euclidean  space 
Rk .  If  the  walking  particle  is  at  point  (mi, . . . ,  m^),  then  it  can  move  with  prob¬ 
abilities  1/2^  to  any  of  the  2k  vertices  of  the  cube  | xj  —  mj  |  =  1,  i.e.  the  points 
with  coordinates  (mi  ±  1, . . . ,  mr  ±  1).  It  is  natural  to  call  this  walk  symmetric. 
Denoting  by  Xn  the  position  of  the  particle  after  the  n- th  jump,  we  have,  as  before, 
a  sequence  of  k-dimensional  random  variables  forming  a  homogeneous  irreducible 
Markov  chain.  We  shall  show  that  all  states  of  the  walk  on  the  plane  are,  as  in  the 
one-dimensional  case,  recurrent.  In  the  three-dimensional  space,  the  states  will  turn 
out  to  be  transient.  Thus  we  shall  prove  the  following  assertion. 

Theorem  13.3.2  The  symmetric  random  walk  is  recurrent  in  spaces  of  one  and  two 
dimensions  and  transient  in  spaces  of  three  or  more  dimensions. 

In  this  context,  W.  Feller  made  the  sharp  comment  that  the  proverb  “all  roads 
lead  to  Rome”  is  true  only  for  two-dimensional  surfaces.  The  assertion  of  Theo¬ 
rem  13.3.2  is  adjacent  to  the  famous  theorem  of  Polya  on  the  transience  of  sym¬ 
metric  walks  in  Rk  for  k  >  2  when  the  particle  jumps  to  neighbouring  points  along 
the  coordinate  axes  (so  that  %j  assumes  2k  values  with  probabilities  1/2 k  each).  We 
now  turn  to  the  proof  of  Theorem  13.3.2. 

Proof  of  Theorem  13.3.2  Let  k  =  2.  It  is  not  difficult  to  see  that  our  walk  Xn  can  be 
represented  as  a  sum  of  two  independent  components 

Xn  =  (X~l,0)  +  (0,  x2n),  (xl  X2)  =  Xo, 

where  Xln,  i  =  1, 2, . . . ,  are  scalar  (one-dimensional)  sequences  describing  symmet¬ 
ric  independent  random  walks  on  the  respective  lines  (axes).  This  is  obvious,  for  the 
two-dimensional  sequence  admits  the  representation 

Xn+ 1  —  Xn  ±  ^ n ,  (13.3.1) 

where  assumes  4  values  (±1,0)  ±  (0,  ±1)  =  (±1,  ±1)  with  probabilities  1/4 
each. 

With  the  help  of  representation  (13.3.1)  we  can  investigate  the  asymptotic  be¬ 
haviour  of  the  transition  probabilities  Pijin).  Let  Xo  coincide  with  the  origin  (0,  0). 
Then 


Poo(2n)  =  P(X2n  =  (0, 0)|X0  =  (0, 0)) 
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=  P(x\n  =  0|4  =  0)P(X^n  =  0|Xg  =  0)  ~  (l/V^)2  =  1  Knri). 

From  this  it  follows  that  the  series  poo(n)  diverges  and  so  all  the  states  of  our 
chain  are  recurrent. 

The  case  k  =  3  should  be  treated  in  a  similar  way.  Represent  the  sequence  Xn  as 
a  sum  of  three  independent  components 

xn  =  (xl  0, 0)  +  (0,  X2n,  0)  +  (0, 0,  x3n), 

where  the  Xln  are,  as  before,  symmetric  random  walks  on  the  real  line.  If  we  set 
Xo  =  (0,  0,  0),  then 

P00(2n )  =  (P(X‘„  =  0|  4  =  0))3  ~  l/(nn)3'2. 

The  series  Poo(n )  is  convergent  here,  and  hence  the  states  of  the  chain  are 

transient.  In  contrast  to  the  straight  line  and  plane  cases,  a  particle  leaving  the  origin 
will,  with  a  positive  probability,  never  come  back. 

It  is  evident  that  a  similar  situation  takes  place  for  walks  in  k-dimensional  space 
with  k  >  3,  since  (nn)  k/2  <  oo  for  k  >  3.  The  theorem  is  proved.  D 


13.3.2  Arbitrary  Symmetric  Random  Walks  on  the  Line 

Let,  as  before, 

n 

Xn=X0  +  YJHj,  (13.3.2) 

1 

but  now  %j  are  arbitrary  independent  identically  distributed  integer- valued  random 
variables.  Theorem  13.3.1  may  be  generalised  in  the  following  way: 

Theorem  13.3.3  If  the  l=j  are  symmetric  and  the  expectation  E i=j  exists  ( and  hence 
E l=j  =  0)  then  the  random  walk  Xn  forms  a  recurrent  Markov  chain  with  null  states. 

Proof  It  suffices  to  verify  that 


oo 

P(V  =  0)  =  oo, 

n  =  1 


where  Sn  =  anc^  that  P (Sn  =  0)  — >►  0  as  n  — >  oo.  Put 

oo 

p(z):=E^‘=  J2  ^p(ti=fe)- 

k= — oo 
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Then  the  generating  function  of  Sn  will  be  equal  to  E zSn  =  pn(z ),  and  by  the  inver¬ 
sion  formula  (see  Sect.  7.7) 


P(S„  =  0)  =  /  pnz~ldz, 

Zjci  J\z\= 1 


(13.3.3) 


L  p(V  =  0) 
/?=() 


1 

2i ri 


L 


dz 


|z|=l  Z(1  ~  p(z)) 


i  rn  dt 

n  Jo  1  -  p(euY 


The  last  equality  holds  since  the  real  function  p(r)  is  even  and  is  obtained  by  sub¬ 
stituting  z  =  elt . 

Since  E§i  =0,  one  has  1  —  p(elt)  =  o{t)  as  t  — >  0  and,  for  sufficiently  small  8 
and  0  <t  <8, 

0  <  1  —  p(elt )  <  t 

(the  function  p(elt )  is  real  by  virtue  of  the  symmetry  of  §i).  This  implies 


dt 

1  —  p(elt ) 


=  oo. 


Convergence  F(Sn  =  0)  — >►  0  is  a  consequence  of  (13.3.3)  since,  for  all  z  on  the 
circle  \z\  =  1,  with  the  possible  exclusion  of  finitely  many  points,  one  has  p(z)  <  1 
and  hence  pn  (z)  — >  0  as  n  — >►  oo.  The  theorem  is  proved.  □ 


Theorem  13.3.3  can  be  supplemented  by  the  following  assertion. 


Theorem  13.3.4  Under  the  conditions  of  Theorem  13.3.3,  if  the  g.c.d.  of  the  possi¬ 
ble  values  of  equals  1  then  the  set  of  values  of  {Xn}  constitutes  a  single  class  of 
essential  communicating  states.  This  class  coincides  with  the  set  of  all  integers. 


The  assertion  of  the  theorem  follows  from  the  next  lemma. 


Lemma  13.3.1  If  the  g.c.d.  of  integers  a\  >  0, . . . ,  ar  >  0  is  equal  to  1,  then  there 
exists  a  number  K  such  that  every  natural  k  >  K  can  be  represented  as 

k  —  n\a\  T  *  ■  ■  T  n^ap, 

where  ni  >  0  are  some  integers. 

Proof  Consider  the  function  L(n)  =  n\a\  +  •  •  •  +  nrar ,  where  n  =  (n\, . . . ,  nr )  is 
a  vector  with  integer  (possibly  negative)  components.  Let  d  >  0  be  the  minimal 
natural  number  for  which  there  exists  a  vector  n°  such  that 


d  =  L  (n°) . 
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We  show  that  every  natural  number  that  can  be  represented  as  L(n)  is  divisible  by  d. 
Suppose  that  this  is  not  true.  Then  there  exist  n,  k  and  0  <  a  <  d  such  that 

L(n)  =  kd  +  a. 

But  since  the  function  L  (n)  is  linear, 

L( n  —  fcv°)  =  k d  +  a  —  k d  =  a  <  d, 

which  contradicts  the  minimality  of  d  in  the  set  of  positive  integer  values  of  L  (n) . 

The  numbers  a\, ...  ,ar  are  also  the  values  of  the  function  L(n),  so  they  are 
divisible  by  d.  The  greatest  common  divisor  of  these  numbers  is  by  assumption 
equal  to  one,  so  that  d  =  1 . 

Let  k  be  an  arbitrary  natural  number.  Denoting  by  0  <  A  the  remainder  after 
dividing  k  by  A  :=  a\  H - f-  ar,  we  can  write 

k  =  m(a\  T  •  •  •  T  ciy}  T  0  —  m(a\  T  •  •  •  T  &r)  T  0L(yiP^ 

=  a\ (m  +  On (m  +  On 2)  H - +  ar  (m  +  On j?) , 

where  :=  m  +  On 9  >  0,  i  =  1, . . . ,  r,  for  sufficiently  large  k  (or  m). 

The  lemma  is  proved.  □ 

Proof  of  Theorem  13.3.4  Put  qj  :=  P(§  =  af)  >  0.  Then,  for  each  k  >  K,  there 
exists  an  n  such  that  nj  —  o>  X!;=i  ajnj  =  an<^  hence,  for  n  =  Y^j= 1  nj>  we  have 

P0k(n)  >q\x  •  •  •  qyr  >  0. 


In  other  words,  all  the  states  k  >  K  are  reachable  from  0.  Similarly,  all  the  states 
k  <  —K  are  reachable  from  0.  The  states  k  e  [—K,K]  are  reachable  from  the 
point  —2 K  (which  is  reachable  from  0).  The  theorem  is  proved.  □ 

Corollary  13.3.1  If  the  conditions  of  Theorems  13.3.3  and  13.3.4  are  satisfied ,  then 
the  chain  (13.3.2)  with  an  arbitrary  initial  state  Xo  visits  every  state  k  infinitely 
many  times  with  probability  1.  In  particular,  for  any  Xo  and  k ,  the  random  variable 
v  =  min{n  :  Xn  =  k}  will  be  proper. 

If  we  are  interested  in  investigating  the  periodicity  of  the  chain  (13.3.2),  then 
more  detailed  information  on  the  set  of  possible  values  of  %j  is  needed.  We  leave 
it  to  the  reader  to  verify  that,  for  example,  if  this  set  is  of  the  form  {a  +  akd}, 
k  =  1, 2, . . . ,  d  >  1,  g.c.d.  (a\,  02, . . .)  =  1,  g.c.d.  (. a ,  d)  =  1,  then  the  chain  will  be 
periodic  with  period  d. 
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13.4  Limit  Theorems  for  Countable  Homogeneous  Chains 
13.4.1  Ergodic  Theorems 

Now  we  return  to  arbitrary  countable  homogeneous  Markov  chains.  We  will  need 
the  following  conditions: 


=  n)  = 


(I)  There  exists  a  state  Eo  such  that  the  recurrence  time  to  Es  (P(r(v) 
fs(n))  has  finite  expectation  Er(v)  <  oo. 

(II)  The  chain  is  irreducible. 

(HI)  The  chain  is  aperiodic. 

We  introduce  the  so-called  “taboo  probabilities”  Pi(n ,  j)  of  transition  from  £) 
to  Ej  in  n  steps  without  visiting  the  “forbidden”  state  Ei : 

Pi(n,  j )  :=  P  (Xn  =  j\ Xi  ^  f . Xn—\  ±i  |  X0  =  /). 

Theorem  13.4.1  (The  ergodic  theorem)  Conditions  (I)— (III)  are  necessary  and  suf¬ 
ficient  for  the  existence,  for  all  i  and  j ,  of  the  positive  limits 


lim  pij  in)  =  it  j  >  0,  i,  j  =  0,1,2,.... 


/2— 


(13.4.1) 


The  sequence  of  values  {itj}  is  the  unique  solution  of  the  system 


Ef=o^  =  i, 


TC j  —  ^ kPkj  i  j  —  0,1,2,..., 


(13.4.2) 


in  the  class  of  absolutely  convergent  series. 

Moreover ;  Et^  <  oo  for  all  j ,  and  the  quantities  itj  =  (Er^)-1  admit  the 
representation 


oo 


Xj  =  ( Erw)  *  =  (Etw)  1  £>,(*,./) 


(13.4.3) 


k=  1 


/or  arry  s 


Definition  13.4.1  A  chain  possessing  property  (13.4.1)  is  called  ergodic. 


The  numbers  itj  are  essentially  the  probabilities  that  the  system  will  be  in  the 
respective  states  Ej  after  a  long  period  of  time  has  passed.  It  turns  out  that  these 
probabilities  lose  dependence  on  the  initial  state  of  the  system.  The  system  “forgets” 
where  it  began  its  motion.  The  distribution  {it j }  is  called  stationary  or  invariant. 
Property  (13.4.2)  expresses  the  invariance  of  the  distribution  with  respect  to  the 
transition  probabilities  pij .  In  other  words,  if  P(Xn  =  k)  =  77>,  then  P(X^+i  =  k)  = 
Y  it  j  pjk  is  also  equal  to  it £. 
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Proof  of  Theorem  13.4.1  Sufficiency  in  the  first  assertion  of  the  theorem.  Consider 
the  “trajectory”  of  the  Markov  chain  starting  at  a  fixed  state  Es.  Let  t\  >  1,  T2  >  1, 
...  be  the  time  intervals  between  successive  returns  of  the  system  to  Es .  Since  after 
each  return  the  evolution  of  the  system  begins  anew  from  the  same  state,  by  the 
Markov  property  the  durations  Xk  of  the  cycles  (as  well  as  the  cycles  themselves) 

are  independent  and  identically  distributed,  Xk  =  x^s\  Moreover,  it  is  obvious  that 

Pfe=«)  =  P(rw  =n)  =  fs(n). 

Recurrence  of  Es  means  that  the  Xk  are  proper  random  variables.  Aperiodicity 
of  Es  means  that  the  g.c.d.  of  all  possible  values  of  Xk  is  equal  to  1.  Since 


Pss(n)  =P(y(«)  =0), 

where  y(n)  is  the  defect  of  level  n  for  the  renewal  process  {7/J, 

k 

Tk  =  'EXi’ 

i  =  1 

by  Theorem  10.3.1  the  following  limit  exists 

lim  pss(n)  =  lim  P (y(n)  =  0)  =  —  >  0.  (13.4.4) 

ft— >0G  ft — >00  Eti 

Now  prove  the  existence  of  lim^oo  pSj(n )  for  j  ^  s.  If  y(n)  is  the  defect  of  level 
n  for  the  walk  {7^}  then,  by  the  total  probability  formula, 

ft 

Psj(n)  =  y^P()/(«)  =  k)P(Xn  =  j\Xo  =  s,y(n)=k).  (13.4.5) 

k=  1 

Note  that  the  second  factors  in  the  terms  on  the  right-hand  side  of  this  formula  do 
not  depend  on  n  by  the  Markov  property: 


P(Vj  =  j\X0=s,y(n)  =  k.) 

—  P(^fft  =  j  \Xo  —  s i  Xn—\  s , . . . ,  Xn-k-\-i  7^  s ,  Xn—k  =  s) 

Ps(k,  j ) 

=  P(Xk  =  j\Xo  =  s,  X\  ^  s,  ... ,  Xk-\  s)  =  — — — 

P(ti  >  k) 

(13.4.6) 


since,  for  a  fixed  Xq  =  s. 


P(Xk  =  j\X\  7^  s, . . . ,  Xk- 1  /  s)  = 


P(Xk  =  J,  X\  /  s, . . . ,  Xk-i  7^  s) 
P(Ai  ^s, .. .,  Xk-\  /  s) 


Ps(k,j ) 
P(r^)  >k) 
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For  the  sake  of  brevity,  put  P(ti  >  k)  =  The  first  factors  in  (13.4.5)  converge, 
as  n  — >  oo,  to  Pk- i/Eri  and,  by  virtue  of  the  equality 

P {yin)  =  k)=  P (y(n  -  k)  =  0)P*_,  <  />*_,,  (13.4.7) 

are  dominated  by  the  convergent  sequence  Pk- 1.  Therefore,  by  the  dominated  con¬ 
vergence  theorem,  the  following  limit  exists 


oo 


lim  Psj(n)  =  V 

n^o O 


Pk- 1  Ps(k,j )  1 


oo 


Eri  P(ri  >  fc)  Eri 

A:= 1  A= 1 


y ^ps(k,j)  —.itj. 


(13.4.8) 


and  we  have,  by  (13.4.5)-(13.4.7), 


n 


oo 


Ps/(w)  <^2Ps(k,j)  <^2Ps(kJ)  =  TtjEr  i 


(13.4.9) 


/:=! 


(t=l 


To  establish  that,  for  any  i , 


lim  pi ;  (n)  =  Kj  >  0, 

n — >  oo 

we  first  show  that  the  system  departing  from  Ei  will,  with  probability  1 ,  eventually 
reach  Es . 

In  other  words,  if  fis(n)  is  the  probability  that  the  system,  upon  leaving  Ei ,  hits 
Es  for  the  first  time  on  the  n- th  step  then 

oo 

y]/^(«) = i- 

72  =  1 

Indeed,  both  states  Ei  and  Es  are  recurrent.  Consider  the  cycles  formed  by  sub¬ 
sequent  visits  of  the  system  to  the  state  Ei .  Denote  by  Ak  the  event  that  the  system 
is  in  the  state  Es  at  least  once  during  the  k- th  cycle.  By  the  Markov  property  the 
events  Ak  are  independent  and  P(A&)  >  0  does  not  depend  on  k.  Therefore,  by  the 
Borel-Cantelli  zero-one  law  (see  Sect.  11.1),  with  probability  1  there  will  occur 
infinitely  many  events  Ak  and  hence  P(1J  A^)  =  1. 

By  the  total  probability  formula, 


n 

Pij(n)  =  y^/i,v(^)Av/(»  -k), 
k= 1 


and  the  dominated  convergence  theorem  yields 


oo 

lim  pij(n)  =  y~ '  fis  (k)TTj  =7tj. 

n — >-  oo 

72  =  1 


Representation  (13.4.3)  follows  from  (13.4.8). 
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Now  we  will  prove  the  necessity  in  the  first  assertion  of  the  theorem.  That  con¬ 
ditions  (II)— (III)  are  necessary  is  obvious,  since  pijin )  >  0  for  every  i  and  j  if  n 
is  large  enough.  The  necessity  of  condition  (I)  follows  from  the  fact  that  equalities 
(13.4.4)  are  valid  for  Es.  The  first  part  of  the  theorem  is  proved. 

It  remains  to  prove  the  second  part  of  the  theorem.  Since 

J2psj(n)  =  1, 

one  has  Ttj  <  1.  By  virtue  of  the  inequalities  psj(n )  <  7T/Eti  (see  (13.4.9)), 
we  can  use  the  dominated  convergence  theorem  both  in  the  last  equality  and  in  the 
equality  psj(n  +  1)  =  Psk(n)pkj  which  yields 

OO 

T.  71 J  =  1  ’  71 J  =  ^2  ■H'kPkJ ' 

k= 0 

It  remains  to  show  that  the  system  has  a  unique  solution.  Let  the  numbers  {qj}  also 
satisfy  (13.4.2)  and  assume  the  series  Jf\qj\  converges.  Then,  changing  the  order 
of  summation,  we  obtain  that 

4/  =  X  upv  =  X  pkJ  ( X pmi ) =  X qi  X  pikpkJ = X  qipi> (2) 

k  k  '  /  '  i  k  i 

—  ^  ^  Pi  j  (2)  (  ^  ^  Pmltfm  )  =  ^  ^  tftnPtnj  (3)  —  '  '  '  —  ^  ^  QkPkj  (p) 
l  '  m  '  m  k 

for  any  n.  Since  J2qk  =  1,  passing  to  the  limit  as  n  — >  oo  gives 

qj  —  ^ qkktj  —  Ttj. 
k 


The  theorem  is  proved.  □ 

If  a  Markov  chain  is  periodic  with  period  d ,  then  pij  (t )  =  0  for  t  ^  kd  and  every 
pair  of  states  E[  and  Ej  belonging  to  the  same  subclass  (see  Theorem  13.2.3).  But 
if  t  —  kd ,  then  from  the  theorem  just  proved  and  Theorem  13.2.3  it  follows  that  the 
limit  lim/^oo  ptj  (kd)  =  tx j  >  0  exists  and  does  not  depend  on  /. 

Verifying  conditions  (II)— (III)  of  Theorem  13.4.1  usually  presents  no  serious  dif¬ 
ficulties.  The  main  difficulties  would  be  related  to  verifying  condition  (I).  For  finite 
Markov  chains,  this  condition  is  always  met. 

Theorem  13.4.2  Let  a  Markov  chain  have  finitely  many  states  and  satisfy  conditions 
(II)— (III).  Then  there  exist  c  >  0  and  q  <  1  such  that,  for  the  recurrence  time  x  to 
an  arbitrary  fixed  state ,  one  has 


P(r  >  n)  <  cqn ,  n  >  1 . 


(13.4.10) 
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These  equalities  clearly  mean  that  condition  (I)  is  always  met  for  finite  chains 
and  hence  the  ergodic  theorem  for  them  holds  if  and  only  if  conditions  (II)— (III)  are 
satisfied. 

Proof  Consider  a  state  Es  and  put 

rj(n)  :=P (Xk^s,k=  1, 2, . . . ,  n\X0  =  j). 

Then,  if  the  chain  has  m  states  one  has  rj(m)  <  1  for  any  j .  Indeed,  rj(n)  does 
not  grow  as  n  increases.  Let  N  be  the  smallest  number  satisfying  rj  (N)  <  1.  This 
means  that  there  exists  a  sequence  of  states  Ej ,  Ep , . . . ,  Ejn  such  that  Ejn  =  Es 
and  the  probability  of  this  sequence  pjjx  •  •  •  PjN_xjN  is  positive.  But  it  is  easy  to 
see  that  N  <m,  since  otherwise  this  sequence  would  contain  at  least  two  identical 
states.  Therefore  the  cycle  contained  between  these  states  could  be  removed  from 
the  sequence  which  could  only  increase  its  probability.  Thus 

r j(m)  <  1,  r(m)  —  max  r j  (m)  <  1. 

j 

Moreover,  rj(n  i  -\-nf)  <  tj  (n\)r(n2)  <  r(n\)r(n2). 

It  remains  to  note  that  if  r  is  the  recurrence  time  to  Es ,  then  P(r  >  nm)  = 
rs  (nm)  <  r(m)n .  The  statement  of  the  theorem  follows.  □ 


Remark  13.4.1  Condition  (13.4.10)  implies  the  exponential  rate  of  convergence  of 
the  differences  \pij(n)  —  n j\  to  zero.  One  can  verify  this  by  making  use  of  the 
analyticity  of  the  function 

oo 

Fs(z) = 

n  —  1 

in  the  domain  |z  |  <  c/~] ,  >  1,  and  of  the  equality 

Ps(z)  =  YjPss(n)zn  =  - - k— -1  (13.4.11) 

^  1  -  C(z) 

(see  Theorem  13.2.1;  we  assume  that  the  x  in  condition  (13.4.10)  refers  to  the  state 
Es,  so  that  fs(n )  =  P(r  =  n)).  Since  /*)'(!)  =  Er  =  1  /ns,  one  has 


Fs(z)  =  1  + 


fe-D 

Its 


and  from  (13.4.11)  it  follows  that  the  function 


oo 

Ps(z)  -  =  y^(fe(n)  -  n s)zn 

1  -  z  , 

n  =  1 


is  analytic  in  the  disk  |z|<l  +  £,£>0.  It  evidently  follows  from  this  that 


\pss(n)  —  ns\  <  c(l  +  s)  n ,  c  =  const. 
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Now  we  will  give  two  examples  of  finite  Markov  chains. 

Example  13.4.1  Suppose  that  the  behaviour  of  two  chess  players  A  and  B  playing 
in  a  multi-player  tournament  can  be  described  as  follows.  Independently  of  the  out¬ 
comes  of  the  previous  games,  player  A  wins  every  new  game  with  probability  p , 
loses  with  probability  q ,  and  makes  a  tie  with  probability  r  =  1  —  p  —  q.  Player  B  is 
less  balanced.  He  wins  a  game  with  probabilities  p  +  s,  p  and  p  —  e,  respectively,  if 
he  won,  made  a  tie,  or  lost  in  the  previous  one.  The  probability  that  he  loses  behaves 
in  a  similar  way:  in  the  above  three  cases,  it  equals  q  —  s,  q  and  q  +  s,  respectively. 
Which  of  the  players  A  and  B  will  score  more  points  in  a  long  tournament? 

To  answer  this  question,  we  will  need  to  compute  the  stationary  probabilities 
77T,  7i2,  tt3  of  the  states  E i,  £2,  £3  which  represent  a  win,  tie,  and  loss  in  a  game, 
respectively  (cf.  the  law  of  large  numbers  at  the  end  of  this  section). 

For  player  A,  the  Markov  chain  with  states  E\,  E2,  E2  describing  his  perfor¬ 
mance  in  the  tournament  will  have  the  matrix  of  transition  probabilities 

(p  r  q\ 

Pa=\p  r  q  I 

\P  r  q) 

It  is  obvious  that  jti  =  p,  tt2  =  r,  Tt2  =  q  here. 

For  player  £,  the  matrix  of  transition  probabilities  is  equal  to 

(p  +  £  r  q  —  s 
p  r  q 
p  —  s  r  q  +  £ 

Equations  for  stationary  probabilities  in  this  case  have  the  form 

7Tl(p  +  £)  +  n2p  +  7T30  -  £)  =7Tl, 

7X\r  +  ix2t  +7T3  r  =  jx2, 

TC\  +  TC2  +  TC  3  =  1 . 


Solving  this  system  we  find  that 

p  —  q 

7r2-r=0,  7ti  -p  =  £- - — . 

1  —  Z£ 

Thus,  the  long  run  proportions  of  ties  will  be  the  same  for  both  players,  and  B  will 
have  a  greater  proportion  of  wins  if£>0,  p  >  q  or  £  <  0,  p  <  q.  If  p  =  q,  then  the 
stationary  distributions  will  be  the  same  for  both  A  and  B . 

Example  13.4.2  Consider  the  summation  of  independent  integer- valued  random 
variables  §1,  §2,  •  •  •  modulo  some  d  >  1  (see  Example  13.1.2).  Set  Xq  :=  0,  X\  := 
§1  —  X2  :=  X\  +  %2  —  l(X\  +  %2)/d\d  etc.  (here  |_.xj  denotes  the  integral 
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part  of  x),  so  that  Xn  is  the  remainder  of  the  division  of  Xn-\  +  by  d.  Such  sum¬ 
mation  is  sometimes  also  called  summation  on  a  circle  (points  0  and  d  are  glued 
together  in  a  single  point).  Without  loss  of  generality,  we  can  evidently  suppose  that 
tjk  takes  the  values  0, 1, . . . ,  d  —  1  only.  If  P(§£  =  j)  =  pj  then 

Pij  =  P(Xn  =  -I  =i)  = 

Assume  that  the  set  of  all  indices  k  with  pk  >  0  has  a  g.c.d.  equal  to  1.  Then  it  is 
clear  that  the  chain  {Xn}  has  a  single  class  of  essential  states  without  subclasses, 
and  there  will  exist  the  limits 


Pj-i  if  j>i, 
Pd+j-i  if  j  <i- 


lim  Pij(n)  =  Ttj 

n^oo 


satisfying  the  system  J2i  71  i  Pij  =  7Tj’  71  j  =  1>  J  =  0,  ...,d  —  1.  Now  note  that 
the  stochastic  matrix  of  transition  probabilities  ||  pij  ||  has  in  this  case  the  following 
property: 

T.  Pij  =  ^2  Pij  =  1  • 
i  j 

Such  matrices  are  called  doubly  stochastic.  Stationary  distributions  for  them  are 
always  uniform,  since  Ttj  =  1  /d  satisfy  the  system  for  final  probabilities. 

Thus  summation  of  arbitrary  random  variables  on  a  circle  leads  to  the  uniform 
limit  distribution.  The  rate  of  convergence  of  pij  ( k )  to  the  stationary  distribution  is 
exponential. 

It  is  not  difficult  to  see  that  the  convolution  of  two  uniform  distributions  under 
addition  modulo  d  is  also  uniform.  The  uniform  distribution  is  in  this  sense  stable. 
Moreover,  the  convolution  of  an  arbitrary  distribution  with  the  uniform  distribution 
will  also  be  uniform.  Indeed,  if  r)  is  uniformly  distributed  and  independent  of 
then  (addition  and  subtraction  are  modulo  d,  pj  =  P(£i  =  j)) 


d- 1 

P(£i  +  >7  =  k)  =  ^2pjP(r] 

7=0 


d- 1  j 

=  ‘-;)  =  I>s 

7=0 


1 

d' 


Thus,  if  one  transmits  a  certain  signal  taking  d  possible  values  (for  example, 
letters)  and  (uniform)  “random”  noise  is  superimposed  on  it,  then  the  received  signal 
will  also  have  the  uniform  distribution  and  therefore  will  contain  no  information 
about  the  transmitted  signal.  This  fact  is  widely  used  in  cryptography. 

This  example  also  deserves  attention  as  a  simple  illustration  of  laws  that  appear 
when  summing  random  variables  taking  values  not  in  the  real  line  but  in  some  group 
(the  set  of  numbers  0,  1, . . . ,  d  —  1  with  addition  modulo  d  forms  a  finite  Abelian 
group).  It  turns  out  that  the  phenomenon  discovered  in  the  example — the  uniformity 
of  the  limit  distribution — holds  for  a  much  broader  class  of  groups. 

We  return  to  arbitrary  countable  chains.  We  have  already  mentioned  that  the  main 
difficulties  when  verifying  the  conditions  of  Theorem  13.4.1  are  usually  related  to 
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condition  (I).  We  consider  this  problem  in  Sect.  13.7  in  more  detail  for  a  wider 
class  of  chains  (see  Theorems  13.7.2-13.7.3  and  corollaries  thereafter).  Sometimes 
condition  (I)  can  easily  be  verified  using  the  results  of  Chaps.  10  and  12. 

Example  13.4.3  We  saw  in  Sect.  12.5  that  waiting  times  in  the  queueing  system 
satisfy  the  relationships 


Xn+\  =  max(Aw  +  £*+1,0),  w\  =  0, 

where  the  are  independent  and  identically  distributed.  Clearly,  Xn  form  a  ho¬ 
mogeneous  Markov  chain  with  the  state  space  {0,  1, . . .},  provided  that  the  ^  are 
integer- valued.  The  sequence  Xn  may  be  interpreted  as  a  walk  with  a  delaying 
screen  at  the  point  0.  If  <  0  then  it  is  not  hard  to  derive  from  the  theorems 
of  Chap.  10  (see  also  Sect.  13.7)  that  the  recurrence  time  to  0  has  finite  expectation. 
Thus,  applying  the  ergodic  theorem  we  can,  independently  of  Sect.  11.4,  come  to 
the  conclusion  that  there  exists  a  limiting  (stationary)  distribution  for  Xn  as  n  ->  oo 
(or,  taking  into  account  what  we  said  in  Sect.  11.4,  conclude  that  sup^>0  Sk  is  finite, 

where  Sk  =  §/>  which  is  essentially  the  assertion  of  Theorem  10.2.1). 

Now  we  will  make  several  remarks  allowing  us  to  state  one  more  criterion  for 
ergodicity  which  is  related  to  the  existence  of  a  solution  to  Eq.  (13.4.2). 

First  of  all,  note  that  Theorem  13.2.2  (the  solidarity  theorem)  can  now  be  com¬ 
plemented  as  follows.  A  state  Ej  is  said  to  be  ergodic  if,  for  any  /,  pij  ( n )  — >►  ttj  >  0 
as  n  — >  oo.  A  state  Ej  is  said  to  be  positive  recurrent  if  it  is  recurrent  and  non-null 
(in  that  case,  the  recurrence  time  r ^  to  Ej  has  finite  expectation  Er^  <  oo).  It 
follows  from  Theorem  13.4.1  that,  for  an  irreducible  aperiodic  chain,  a  state  Ej  is 
ergodic  if  and  only  if  it  is  positive  recurrent.  If  at  least  one  state  is  ergodic,  all  states 
are. 

Theorem  13.4.3  Suppose  a  chain  is  irreducible  and  aperiodic  (satisfies  conditions 
(II)— (III)).  Then  only  one  of  the  following  two  alternatives  can  take  place :  either  all 
the  states  are  null  or  they  are  all  ergodic.  The  existence  of  an  absolutely  convergent 
solution  to  system  (13.4.2)  is  necessary  and  sufficient  for  the  chain  to  be  ergodic. 

Proof  The  first  assertion  of  the  theorem  follows  from  the  fact  that,  by  the  local 
renewal  Theorem  10.2.2  for  the  random  walk  generated  by  the  times  of  the  chain’s 
hitting  the  state  Ej ,  the  limit  lim^oo  Pjj(n)  always  exists  and  equals  (Ex^)~l . 

Therefore,  to  prove  sufficiency  in  the  second  assertion  (the  necessity  follows 
from  Theorem  13.4.1)  we  have,  in  the  case  of  the  existence  of  an  absolutely  con¬ 
vergent  solution  {7r y } ,  to  exclude  the  existence  of  null  states.  Assume  the  contrary, 
Pij(n)  — >  0.  Choose  j  such  that  Ttj  >  0.  Then 

o  C  7Zj  =  Y'XiPijin)  o 


as  n  —>  oo  by  dominated  convergence.  This  contradiction  completes  the  proof  of  the 
theorem.  □ 
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13.4.2  The  Law  of  Large  Numbers  and  the  Central  Limit  Theorem 
for  the  Number  of  Visits  to  a  Given  State 


In  conclusion  of  this  section  we  will  give  two  assertions  about  the  limiting  be¬ 
haviour,  as  n  — >  oo,  of  the  number  ntj(n )  of  visits  of  the  system  to  a  fixed  state 
Ej  by  the  time  n.  Let  x^  be  the  recurrence  time  to  the  state  Ej. 


Theorem  13.4.4  Let  the  chain  be  ergodic  and ,  at  the  initial  time  epoch ,  be  at  an 
arbitrary  state  Es.  Then ,  as  n  —>  oo, 

E mj{n)  m\(n )  a.s. 

- - >  7T  /  ,  — - >  7 r  /  . 

n  n 

If  additionally  Var(T^)  =  crj  <  oo 


P 


mj  (n)  —  mtj 


<  v|Xq  =  s 


0(X) 


as  n  — >  oo,  where  <P(x)  is,  as  before,  the  distribution  function  of  the  normal  law 
with  parameters  (0,  1). 


Proof  Note  that  the  sequence  mj  (n)  +  1  coincides  with  the  renewal  process  formed 
by  the  random  variables  x\,  T2,  T3, . . . ,  where  x\  is  the  time  of  the  first  visit  to  the 

state  Ej  by  the  system  which  starts  at  Es  and  Xf  =  r ^  for  C  >  2.  Clearly,  by  the 
Markov  property  all  Xj  are  independent.  Since  x\  >  0  is  a  proper  random  variable, 
Theorem  13.4.4  is  a  simple  consequence  of  the  generalisations  of  Theorems  10.1.1, 
11.5.1,  and  10.5.2  that  were  stated  in  Remarks  10.1.1,  11.5.1  and  10.5.1,  respec¬ 
tively. 

The  theorem  is  proved.  □ 

Summarising  the  contents  of  this  section,  one  can  note  that  studying  the  se¬ 
quences  of  dependent  trials  forming  homogeneous  Markov  chains  with  discrete  sets 
of  states  can  essentially  be  carried  out  with  the  help  of  results  obtained  for  sequences 
of  independent  random  variables.  Studying  other  types  of  dependent  trials  requires, 
as  a  rule,  other  approaches. 


13.5*  The  Behaviour  of  Transition  Probabilities  for  Reducible 
Chains 

Now  consider  a  finite  Markov  chain  of  the  general  type.  As  we  saw,  its  state  space 
consists  of  the  class  of  inessential  states  S°  and  several  classes  Sl, . . . ,  Sl  of  es¬ 
sential  states.  To  clarify  the  nature  of  the  asymptotic  behaviour  of  Pijin)  for  such 
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Fig.  13.4  The  structure  of 
the  matrix  of  transition 
probabilities  of  a  periodic 
Markov  chain  with  the  class 
S°  of  inessential  states:  an 
illustration  to  the  proof  of 
Theorem  13.2.3 


5 


i 


5° 

s1 


chains,  it  suffices  to  consider  the  case  where  essential  states  constitute  a  single  class 
without  subclasses  (/  =  1).  Here,  the  matrix  of  transition  probabilities  pij(n )  has 
the  form  depicted  in  Fig.  13.4. 

By  virtue  of  the  ergodic  theorem,  the  entries  of  the  submatrix  L  have  positive 
limits  Ttj .  Thus  it  remains  to  analyse  the  behaviour  of  the  entries  in  the  upper  part 
of  the  matrix. 

Theorem  13.5.1  Let  Ei  e  S°.  Then 


lim  pi j{t) 

— >•  oo 


0,  if  Ej  e  S° , 

Ttj  >  0,  if  Ej  e  5*. 


Proof  Let  Ej  e  5°.  Set 

Aj(t)  :=  max  pu(t). 

Ei  gS° 

For  any  essential  state  Er  there  exists  an  integer  tr  such  that  Pir(tr)  >  0.  Since 
transition  probabilities  in  L  are  all  positive  starting  from  some  step,  there  exists  an  s 
such  that  puis)  >  0  for  Ei  e  S°  and  all  Ei  e  Sl .  Therefore,  for  sufficiently  large  t , 

Pij(t)=  E  Pik(s)Pkj(t  -  S)  <  Aj(t  -  S)  ^  Pik(s), 

EkeS°  EkeS° 


where 


q(i)'-=  E  Pik(s)  =  l-  E  Pik(s)  <  1- 

EkeS°  EkeSl 

If  we  put  q  :=  max^.^o  q(i),  then  the  displayed  inequality  implies  that 


A  jit)  <  qAj(t  —  s)  <  -  -  -  <  q^/s\ 


Thus  lim^oo  ptjit)  <  lim 

t — >oc  Aj(t)  =  0. 

Now  let  Ei  €  S°  and  Ej  e^1.  One  has 

Pij(t  +  5)  =  'Y^Pikit)Pkj(s)  =  y]  Pik(t)Pkj(s)  +  Pik(t)Pkj(s). 

k  Ek€S°  Ei-eS1 
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Letting  t  and  s  go  to  infinity,  we  see  that  the  first  sum  in  the  last  expression  is  o(l). 
In  the  second  sum, 

X,  =  1  +0(1);  Pkj(t)=7Tj  +0(1). 

EeS 1 

Therefore 

Pij(t+s)=itj  piktt) +o(t) =jtj +o(n 

Ek^S 

as  t,  s  — >►  oo.  The  theorem  is  proved.  □ 

Using  Theorem  13.5.1,  it  is  not  difficult  to  see  that  the  existence  of  the  limit 

lim  pa(n )  =  7i  j  >0 

t^oo 

is  a  necessary  and  sufficient  condition  for  the  chain  to  have  two  classes  S°  and  Sl, 
of  which  Sl  contains  no  subclasses. 


13.6  Markov  Chains  with  Arbitrary  State  Spaces.  Ergodicity  of 
Chains  with  Positive  Atoms 

13.6.1  Markov  Chains  with  Arbitrary  State  Spaces 

The  Markov  chains  X  =  {Xn}  considered  so  far  have  taken  values  in  the  count¬ 
able  sets  {1,2,...}  or  {0,  1, . . .};  such  chains  are  called  countable  (< denumerable ) 
or  discrete.  Now  we  will  consider  Markov  chains  with  values  in  an  arbitrary  set  of 
states  X  endowed  with  a  a -algebra  03  x  o f  subsets  of  X.  The  pair  (X,  03  x)  forms 
a  (measurable)  state  space  of  the  chain  {Xn}.  Further  let  (£2,  {J,  P)  be  the  underly¬ 
ing  probability  space.  A  measurable  mapping  Y  of  the  space  (Q,  {J)  into  (X,  03x)  is 
called  an  X-valued  random  element.  If  X  =  R  and  03 x  is  the  a -algebra  of  Borel  sets 
on  the  line,  then  Y  will  be  a  conventional  random  variable.  The  mapping  Y  could 
be  the  identity,  in  which  case  (£2,  {J)  =  (X,  03x)  is  also  called  a  sample  space. 

Consider  a  sequence  {Xn}  of  X-valued  random  elements  and  denote  by  3^>w, 
m  >  k,  the  a -algebra  generated  by  the  elements  Xk, . . . ,  Xm  (i.e.  by  events  of 
the  form  {Xk  G  Bk}, . . . ,  {Xm  e  Bm },  B[  e  03x,  i  =  k, ... ,  m).  It  is  evident  that 

:=^0  n  form  a  non-decreasing  sequence  {Jo  C  . . .  C  3n _ The  conditional 

expectation  E(§  | $k,m)  will  sometimes  also  be  denoted  by  E(§  |X&, . . . ,  Xm). 

Definition  13.6.1  An  X-valued  Markov  chain  is  a  sequence  of  X-valued  elements 
Xn  such  that,  for  any  Be  03  x, 


G  B  |  $n)  —  P(Aw+i  G  B  |  Xn)  a.s. 


(13.6.1) 
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In  the  sequel,  the  words  “almost  surely”  will,  as  a  rule,  be  omitted. 

By  the  properties  of  conditional  expectations,  relation  (13.6.1)  is  clearly  equiva¬ 
lent  to  the  condition:  for  any  measurable  function  /  :  X  — >  R,  one  has 

E(/(X„+i)  |  Sn)  =  E(/(X„+1)  |  Xn).  (13.6.2) 

Definition  13.6.1  is  equivalent  to  the  following. 

Definition  13.6.2  A  sequence  X  =  {Xn}  forms  a  Markov  chain  if,  for  any  A  g 

$n+l  ,oo? 

P(A|^)  =  P(A|X„)  (13.6.3) 

or,  which  is  the  same,  for  any  5«+i  .^.-measurable  function  f(a>), 

E(/(ffl)|ff„)  =  E(/ (<w)|X„).  (13.6.4) 

Proof  of  equivalence  We  have  to  show  that  (13.6.2)  implies  (13.6.3).  First  take  any 
B\,  B2  G  33x  and  let  A  :=  [Xn+\  g  B\,  Xn+2  €  B2}.  Then,  by  virtue  of  (13.6.2), 

P(A|3„)  =  E[I(X„+1  e  E>i)P(Xn+2  e  B2\3n+i)\3n] 

=  E[l(X„+i  e  fii)P(Xn+2  e  B2\Xn+1)\3n] 

=  E(A|X„). 

This  implies  inequality  (13.6.3)  for  any  A  e  \,n+2,  where  Ak,m  is  the  algebra 
generated  by  sets  {Xk  G  Bk, ... ,  Xm  G  Bm}.  It  is  clear  that  An+\,n+2  generates 
Sr/i+i,»+2-  Now  let  A  G  5n+i,n+2-  Then,  by  the  approximation  theorem,  there  exist 
Ak  G  ijW+2  such  that  d(A,  Ak)  — >  0  (see  Sect.  3.4).  From  this  it  follows  that 

p 

I(A&)  — >  1(A)  and,  by  the  properties  of  conditional  expectations  (see  Sect.  4.8.2), 

p(A*ir)4p(Air), 

where  C  3  is  some  <7 -algebra.  Put  Pa  =  Pa(oj)  :=  P(A|XW).  We  know  that,  for 
Ap  G  An-\-\  n- 1_2, 

E(Pa*;*)  =  P(A*5)  (13.6.5) 

for  any  5  g^  (this  just  means  that  PAk(co)  =  P(A^|3rw)).  Again  making  use  of 
the  properties  of  conditional  expectations  (the  dominated  convergence  theorem,  see 
Sect.  4.8.2)  and  passing  to  the  limit  in  (13.6.5),  we  obtain  that  E (Pa',  B)  =  P (AB). 
This  proves  (13.6.3)  for  A  g  $n+i,n+2- 

Repeating  the  above  argument  m  times,  we  prove  (13.6.3)  for  A  g  $n+i,m  -  Using 
a  similar  scheme,  we  can  proceed  to  the  case  of  A  g  3^ +1,00-  □ 

Note  that  (13.6.3)  can  easily  be  extended  to  events  A  g  $n,o o-  In  the  above  proof 
of  equivalence,  one  could  work  from  the  very  beginning  with  A  g  ^)00  (first  with 
A  G  An,n+ 2,  and  so  on). 

We  will  give  one  more  equivalent  definition  of  the  Markov  property. 
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Definition  13.6.3  A  sequence  {Xn}  forms  a  Markov  chain  if,  for  any  events  A  e  $n 

and  B  £  $n,o o? 

P(AB\Xn)  =  P(A\Xn)P(B\Xn).  (13.6.6) 

This  property  means  that  the  future  is  conditionally  independent  of  the  past  given 
the  present  (conditional  independence  of  and  $n,oo  given  Xn). 

Proof  of  the  equivalence  Assume  that  (13.6.4)  holds.  Then,  for  A  e  $n  and  B  e 

$n,oo> 


V(AB\Xn)  =  E[E(IaIb|3„)|X„]  =E[IaE(Ib|^)|X„] 

=  E[IAE(Ib  I  Xn  )  |  Xn ]  =  E(I*  I  Xn  )E(IA  |  Xn  ) , 

where  lA  is  the  indicator  of  the  event  A. 

Conversely,  let  (13.6.6)  hold.  Then 

P(AB)=EP(AB\Xn)=EP(A\Xn)P(B\Xn) 

=  EE[lAP(B\Xn)\Xn]  =  ElAP(B\Xn).  '} 

On  the  other  hand, 

P(AB)  =  EIaIb  =  El  A  P(B\ &,).  (13.6.8) 

Since  (13.6.7)  and  (13.6.8)  hold  for  any  A  g  this  means  that 

P(B\Xn)  =  P(B\$n).  □ 

Thus,  let  {Xn}  be  an  X-valued  Markov  chain.  Then,  by  the  properties  of  condi¬ 
tional  expectations, 


P(Xn+ 1  G  B\Xn)  —  Pqq(Xn,  B), 

where  the  function  Pn(x ,  B)  is,  for  each  B  e  measurable  in  v  with  respect  to 
the  a-algebra  33x-  In  what  follows,  we  will  assume  that  the  functions  P(n) (jc,  B) 
are  conditional  distributions  (see  Definition  4.9.1),  i.e.,  for  each  v  e  X,  P(n)(x,  B) 
is  a  probability  distribution  in  B.  Conditional  distributions  P(n)  (x ,  B)  always  exist 
if  the  a -algebra  93 x  is  countably-generated,  i.e.  generated  by  a  countable  collec¬ 
tion  of  subsets  of  X  (see  [27]).  This  condition  is  always  met  if  X  =  Rk  and  93  x 
is  the  a -algebra  of  Borel  sets.  In  our  case,  there  is  an  additional  problem  that  the 
“null  probability”  sets  N  C  X,  on  which  one  can  arbitrarily  vary  P(n)  (x ,  B ),  can 
depend  on  the  distribution  of  Xn ,  since  the  “null  probability”  is  with  respect  to  the 
distribution  of  Xn . 

Definition  13.6.4  A  Markov  chain  X  =  { Xn }  is  called  homogeneous  if  there  ex¬ 
ist  conditional  distributions  P(n)  (x ,  B)  =  P  (x ,  B)  independent  of  n  and  the  initial 
value  Xq  (or  the  distributions  of  Xn).  The  function  P(x,  B)  is  called  the  transition 
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probability  (or  transition  function )  of  the  homogeneous  Markov  chain.  It  can  be 
graphically  written  as 


P(jc,5)  =  P(Xi  £  B\Xq=x).  (13.6.9) 

If  the  Markov  chain  is  countable,  X  =  {1,2,...},  then,  in  the  notation  of  Sect.  13.1, 
one  has  P(i,  {j})  =  ptj  =  pij(  1). 


The  transition  probability  and  initial  distribution  (of  Xo)  completely  determine 
the  joint  distribution  of  Xo, . . . ,  Xn  for  any  n.  Indeed,  by  the  total  probability  for¬ 
mula  and  the  Markov  property 


P(Xq  £  B0,  •  •  • ,  Xn  G  Bn) 


P(X0  £  dyo)  P(yo,  dy\)  •  •  •  P{yn-\,dyn). 


(13.6.10) 


A  Markov  chain  with  the  initial  value  Xo  =  x  will  be  denoted  by  {Xn  (*)}. 

In  applications,  Markov  chains  are  usually  given  by  their  conditional  distribu¬ 
tions  P(x,B)  or — in  a  “stronger  form” — by  explicit  formulas  expressing  Xn+\ 
in  terms  Xn  and  certain  “control”  elements  (see  Examples  13.4.2,  13.4.3,  13.6.1, 
13.6.2,  13.7.1-13.7.3)  which  enable  one  to  immediately  write  down  transition 
probabilities.  In  such  cases,  as  we  already  mentioned,  the  joint  distribution  of 
(Xo, . . . ,  Xn)  can  be  defined  in  terms  of  the  initial  distribution  of  Xo  and  the  transi¬ 
tion  function  P(x,  B)  by  formula  (13.6.10).  It  is  easily  seen  that  the  sequence  {Xn} 
with  so  defined  joint  distributions  satisfy  all  the  definitions  of  a  Markov  chain  and 
has  transition  function  P(x,  B).  In  what  follows,  wherever  it  is  needed,  we  will  as¬ 
sume  condition  (13.6.10)  is  satisfied.  It  can  be  considered  as  one  more  definition  of 
a  Markov  chain,  but  a  stronger  one  than  Definitions  13.6.2-13.6.4,  for  it  explicitly 
gives  (or  uses)  the  transition  function  P(x,  B). 

One  of  the  main  objects  of  study  will  be  the  asymptotic  behaviour  of  the  n  step 
transition  probability: 

P(x,n,  B)  \=V(Xn(x)  e  B)=  P(Xn  e  B\X0=x). 


The  following  recursive  relation,  which  follows  from  the  total  probability  formula 
(or  from  (13.6.10)),  holds  for  this  function: 


P(X„+1  e  B)  =  EE(l(Xn+1  e  B)\$n 


P  (Xnedy)P(y,B), 


P(x,  n  +  1,  B)  = 


P(x,  n,  dy)P(y,  B). 


(13.6.11) 


Now  note  that  the  Markov  property  (13.6.3)  of  homogeneous  chains  can  also  be 
written  in  the  form 


p (Xn+k  £  Bk\3n)  =  P  (xn,k,  Bk), 
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or,  more  generally, 


P(V,+i  eB i . Xn+k  e  Bk\$n )  =  P(Xfw(X„)  eB i . Xfev,(Xn)  e  Bk), 

(13.6.12) 

where  {2f£ew(v)}  is  a  Markov  chain  independent  of  {Xn}  and  having  the  same  tran¬ 
sition  function  as  {Xn}  and  the  initial  value  x.  Property  (13.6.12)  can  be  extended 
to  a  random  time  n.  Recall  the  definition  of  a  stopping  time. 

Definition  13.6.5  A  random  variable  v  >  0  is  called  a  Markov  or  stopping  time  with 
respect  to  {#„}  if  {v  <  n]  e  $n-  In  other  words,  that  the  event  {v  <  n }  occurred  or 
not  is  completely  determined  by  the  trajectory  segment  X$,X\, . . .  ,Xn. 

Note  that,  in  Definition  13.6.5,  by  one  often  understands  wider  a -algebras, 
the  essential  requirements  being  the  relations  {v  <  n]  e  and  measurability  of 
Xq,  ...,Xn  with  respect  to  $n . 

Denote  by  $v  the  o -algebra  of  events  B  such  that  B  n  {v  =  k)  e  In  other 
words,  Tv  can  be  thought  of  as  the  o -algebra  generated  by  the  sets  {v  =  k}Bk , 
Bk  c  i-e-  by  the  trajectory  of  {Xn}  until  time  v. 

Lemma  13.6.1  (The  Strong  Markov  Property)  For  any  k>  1  and  B\ , . . . ,  Bk  e  53^, 

P(Xv+l  eBi . Xv+k  G  Bk Iffv)  =  P(X”ew(Xv)  efil . X”ew(Xv)  e  B*), 

where  the  process  {X^ew}  is  defined  in  (13.6.12). 

Thus,  after  a  random  stopping  time  v,  the  trajectory  Xv+i,  2fv+2,  •  •  •  will  evolve 
according  to  the  same  laws  as  X\,  X2, . . . ,  but  with  the  initial  condition  Xv.  This 
property  is  called  the  strong  Markov  property.  It  will  be  used  below  for  the  first 
hitting  times  v  =  Ty  of  certain  sets  V  C  X  by  {Xn}.  We  have  already  used  this 
property  tacitly  in  Sect.  13.4,  when  the  set  V  coincided  with  a  point,  which  allowed 
us  to  cut  the  trajectory  of  {Xn}  into  independent  cycles. 

Proof  of  Lemma  13.6.1  For  the  sake  of  simplicity,  consider  one-dimensional  distri¬ 
butions.  We  have  to  prove  that 

P(Xv+leBl\$v)  =  P(Xv,Bl). 


For  any  AgJv, 

E (P(XV,  B 1);  A)  =  y]E(P(X„,  Bi);  A{v=n}) 

n 

=  y^EE(/ (A{v  =  n}{Xn+i  e  Bi})|$n) 

n 

=  y]P(A{v  =n}[Xn+l  e  fii})  =  P(A{Xy+i  e  fii}). 


n 
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But  this  just  means  that  P(XV ,  B\)  is  the  required  conditional  expectation.  The  case 
of  multi-dimensional  distributions  is  dealt  with  in  the  same  way,  and  we  leave  it  to 
the  reader.  □ 


Now  we  turn  to  consider  the  asymptotic  properties  of  distributions  P(x,n,  B)  as 
n  oo. 


Definition  13.6.6  A  distribution  7r (*)  on  (X,  is  called  invariant  if  it  satisfies 
the  equation 


n(B)  = 


j  jc(dy)P(y,  B), 


B  e<Bx. 


(13.6.13) 


It  follows  from  (13.6.11)  that  if  Xn  7r,  then  Xn+i  €=  tc.  The  distribution  jt  is 
also  called  stationary. 

For  Markov  chains  in  arbitrary  state  spaces  X,  a  simple  and  complete  classifica¬ 
tion  similar  to  the  one  carried  out  for  countable  chains  in  Sect.  13.1  is  not  possible, 
although  some  notions  can  be  extended  to  the  general  case. 

Such  natural  and  important  notions  for  countable  chains  as,  say,  irreducibility  of 
a  chain,  take  in  the  general  case  another  form. 


Example  13.6.1  Let  Xn+\  =  Xn  +  (mod  1)  ( Xn+i  is  the  fractional  part  of 
Xn  +  §w),  Hn  be  independent  and  identically  distributed  and  take  with  positive  prob¬ 
abilities  the  two  values  0  and  \/2.  In  this  example,  the  chain  “splits”,  according 
to  the  initial  state  x,  into  a  continual  set  of  “subchains”  with  state  spaces  of  the 
form  Mx  =  {x  +  k\fl  (mod  1),  k  =  0,  1,  2 . . .}.  It  is  evident  that  if  x\  —  X2  is  not  a 
multiple  of  \fl  (mod  1),  then  MX]  and  MX1  are  disjoint,  P(Xn(jti)  e  MX1)  =  0  and 
V(Xn(x2)  E  MXl)  =  0  for  all  n.  Thus  the  chain  is  clearly  reducible.  Nevertheless,  it 
turns  out  that  the  chain  is  ergodic  in  the  following  sense:  for  any  jc,  Xn(x)  €=>  Uo,i 
( P(x,n ,  [0,  t ])  — >►  t)  as  n  — >►  oo  (see,  e.g.,  [6],  [18]).  For  the  most  commonly  used 
irreducibility  conditions,  see  Sect.  13.7. 

Definition  13.6.7  A  chain  is  called  periodic  if  there  exist  an  integer  d  >2  and  a 
set  Xi  c  X  such  that,  for  i  eXp  one  has  P(x ,  n,  Xi)  =  P(Xn(jt)  e  Xi)  =  1  for 
n  =  kd ,  k  =  1, 2, . . . ,  and  P(x,n,X i)  =  0  for  n  ^  led. 

Periodicity  means  that  the  whole  set  of  states  X  is  decomposed  into  subclasses 
Xi, . . . ,  Xj,  such  that  P(Xi(v)  e  Xjt+i)  =  1  for  v  e  X^,  k  =  1, . . . ,  d,  Xj+i  =  Xi. 
In  the  absence  of  such  a  property,  the  chain  will  be  called  aperiodic. 

A  state  vo  E  X  is  called  an  atom  of  the  chain  X  if,  for  any  v  e  X, 


Example  13.6.2  Let  Xo  >  0  and,  for  n  >  0, 

Xn+\  =  ' 


(Xn  +?„+!)+  if  Xn  >  0, 
Y]n+ 1  if  Xn  =  0, 
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where  and  rjn  >  0,  n  =  1, 2, . . . ,  are  two  sequences  of  independent  random  vari¬ 
ables,  identically  distributed  in  each  sequence.  It  is  clear  that  {Xn}  is  a  Markov  chain 
and,  for  E§£  <  0,  by  the  strong  law  of  large  numbers,  this  chain  has  an  atom  at  the 
point  vo  =  0: 

(00 

(J  {X„(x)=0} 

It  —  1 

where  Sk  =  Y^)=\  §/•  This  chain  is  a  generalisation  of  the  Markov  chain  from  Ex¬ 
ample  13.4.3. 

Markov  chains  in  an  arbitrary  state  space  X  are  rather  difficult  to  study.  However, 
if  a  chain  has  an  atom,  the  situation  may  become  much  simpler,  and  the  ergodic 
theorem  on  the  asymptotic  behaviour  of  P(x,n,  B)  as  n  — >  oo  can  be  proved  using 
the  approaches  considered  in  the  previous  sections. 


13.6.2  Markov  Chains  Having  a  Positive  Atom 


Let  vo  be  an  atom  of  a  chain  {Xn}.  Set 

r  :=  minjk  >  0  :  X^(vo)  =  vo}. 

This  is  a  proper  random  variable  (P(r  <  oo)  =  1). 

Definition  13.6.8  The  atom  vo  is  said  to  be  positive  if  Er  <  oo. 


In  the  terminology  of  Sect.  13.4,  vo  is  a  recurrent  non-null  (positive)  state. 

To  characterise  convergence  of  distributions  in  arbitrary  spaces,  we  will  need  the 
notions  of  the  total  variation  distance  and  convergence  in  total  variation.  If  P  and  Q 
are  two  distributions  on  (X,  <Bx)>  then  the  total  variation  distance  between  them  is 
defined  by 


|P-QII  = 


2  sup 


P  (B) 


Q(B) 


One  says  that  a  sequence  of  distributions  P77  on  (X,  <Bx)  converges  in  total  variation 
TV 

to  P  (Pn  — >  P)  if  ||Pn  —  P 1 1  — >  0  as  /i  — >  oo.  For  more  details,  see  Sect.  3.6.2  of 
Appendix  3. 

As  in  Sect.  13.4,  denote  by  Bx  o  (k,  B)  the  “taboo  probability” 


PXQ(k,  B)  :=  P(X*(v0)  6  B,  Xi(vo)  /  xq,  . . . ,  X^_i(vq)  /  vo) 


of  transition  from  vq  into  B  in  k  steps  without  visiting  the  “forbidden”  state  vq. 


Theorem  13.6.1  If  the  chain  {Xn}  has  a  positive  atom  and  the  g.c.d.  of  the  possible 
values  ofr  is  1,  then  the  chain  is  ergodic  in  the  convergence  in  total  variation  sense: 
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there  exists  a  unique  invariant  distribution  n  such  that,  for  any  x  e  X,  as  n  —>  oo. 


P(x,  n,  •)  —  7r (•)  — >►  0 


(13.6.14) 


Moreover,  for  any  B  e 


j  oo 

7t(B)  =  —J2Px(,(k’B) 

x  k=  1 


(13.6.15) 


If  we  denote  by  Xn  (/jlq)  a  Markov  chain  with  the  initial  distribution  /jlq  ( Xo  ^  /Iq) 
and  put 

P(/i0,n,B)  :=P (X„(/t0)efi)  =  J  fi0(dx)  P(x,n,  B), 
then,  as  well  as  (13.6.14),  we  will  also  have  that,  as  n  —>  oo, 


P(/L0,n,-)-Jc( •)  — >  0 


(13.6.16) 


for  any  initial  distribution  /Iq. 

The  condition  that  there  exists  a  positive  atom  is  an  analogue  of  conditions  (I) 
and  (II)  of  Theorem  13.4.1.  A  number  of  conditions  sufficient  for  the  finiteness  of 
Er  can  be  found  in  Sect.  13.7.  The  condition  on  the  g.c.d.  of  possible  values  of  r  is 
the  aperiodicity  condition. 


Proof  We  will  effectively  repeat  the  proof  of  Theorem  13.4.1.  First  let  Xo  =  Jto-  As 
in  Theorem  13.4.1  (we  keep  the  notation  of  that  theorem),  we  find  that 

P(xo,n,  B) 

n 

=  ^P(y(«)=^)P(X„e  #1  Xn-k  =  XO,  Xn-k+\  fzxo,  ...  ,  Xn-\  /  Vo) 

k=  1 

"Y(y(n)=k) 

=  A  .  ,/P(T  >  k)  P  (Xk  eB \Xo  =  xo,Xi^xo,...,  A-i  ±  *0) 


k=  1 


P(r  >  k) 


"  P (y(n)  =  k) 

=  J2  L.Lud  p*o(k’By 


k=  l 


P(r  >  k) 


For  the  measure  n  defined  in  (13.6.15)  one  has 


P(x o,  n,  B)  —  Jt(B) 


A/P  (y(n)=k)  1  \  1  ^ 

VI  ALA - L - )PxJk,B) - V Pxn(k,B). 

P(r  >k)  Er  J  0  Er  ^  0 

fc=l  V  V  “  7  7  k>n 
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Since  P (y(n)  =  k)  <  P(r  >  k )  and  PXQ(k ,  B )  <  P(r  >  fc)  (see  the  proof  of  Theo¬ 
rem  13.4.1),  one  has,  for  any  N, 


N 

sup  P  (jto,  n,  B)  —  Jt(B)  < 
b  k= i 


p  (Y(n)  =  k) 
P(r  >  k) 


Further,  since 


+  2y]P(r>fe). 

k>N 

(13.6.17) 


oo 

P(y(n)  =  &)—>*  P(r  >  £)/Et,  EP(t  >  A:)  =  Er  <  oo, 

&=1 


the  right-hand  side  of  (13.6.17)  can  be  made  arbitrarily  small  by  choosing  N  and 
then  n .  Therefore, 


lim  sup  P(xo,n,  B) 

n^oo  g 


Jt(B) 


Now  consider  an  arbitrary  initial  state  x  g!,x/  xo.  Since  xo  is  an  atom,  for  the 
probabilities 


F(x,  k,  xo)  :=  P  (Xk(x)  =  x0,  Xi  ^  x0, . . . ,  Xk-\  /  xo) 
of  hitting  xq  for  the  first  time  on  the  k-th  step,  one  has 


n 

F(x,  k,  xo)  =  1,  P(x,  n,  B)  =  F(x,  k ,  xo)P(xo,  n  —  k,  B), 

k  k=  1 


P{x,n,  ■)  -  n{ •) 


<  ^  F(x,k,x o) 

k<n/2 


P(x0,n  -  k,  •)  —  jt(-) 


+  2  ^  F(x,k,x o)  ^  0 

k>n/2 


as  n  — >  oo. 

Relation  (13.6.16)  follows  from  the  fact  that 


0 


tr(-) 


< 


J  fi0(dx)\\P(x,n,  •) 


*(•) 


0 


by  the  dominated  convergence  theorem. 

Further,  from  the  convergence  of  P(x,  n,  •)  in  total  variation  it  follows  that 


J  P(x,n,  dy)P(y,  B) 


j  n  (dy) P (y ,  B) . 


Since  the  left  hand-side  of  this  relation  is  equal  to  P(x,  n  +  1,  B)  by  virtue  of 
(13.6.11)  and  converges  to  jt(B ),  one  has  (13.6.13),  and  hence  j r  is  an  invariant 


measure. 
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Now  assume  that  n\  is  another  invariant  distribution.  Then 


TV 


Jtl(-)  =P(jtl,n,  •)  - 7C\=7t. 


The  theorem  is  proved. 


□ 


Returning  to  Example  13.6.2,  we  show  that  the  conditions  of  Theorem  13.6.1  are 
met  provided  that  <  0  and  E  %  <  oo.  Indeed,  put 


r](—x)  :=  min 


7  =  1 


By  the  renewal  Theorem  10.1.1, 

v 

H(x)  =  En(— x)  ~ -  as  v  oo 

mi\ 

for  E§i  <0,  and  therefore  there  exist  constants  ci  and  C2  such  that  H(x)  <  c\  +  c^x 
for  all  x  ^  0.  Hence,  for  the  atom  xq  —  0,  we  obtain  that 

p  oo  p  oo 

Er  =  /  P(?7i  G  //(v)  <  ci  +  C2  /  vP(?7i  G  Jv)  =  ci  +  C2E/71  <  00. 

Jo  Jo 


13.7*  Ergodicity  of  Harris  Markov  Chains 
13.7.1  The  Ergodic  Theorem 

In  this  section  we  will  consider  the  problem  of  establishing  ergodicity  of  Markov 
chains  in  arbitrary  state  spaces  (X,  ®x)-  A  lot  of  research  has  been  done  on  this 
problem,  the  most  important  advancements  being  associated  with  the  names  of 
W.  Doblin,  J.L.  Doob,  T.E.  Harris  and  E.  Omey.  Until  recently,  this  research  area 
had  been  considered  as  a  rather  difficult  one,  and  not  without  reason.  However,  the 
construction  of  an  artificial  atom  suggested  by  K.B.  Athreya,  P.E.  Ney  and  E.  Num- 
melin  (see,  e.g.  [6,  27,  29])  greatly  simplified  considerations  and  allowed  the  proof 
of  ergodicity  by  reducing  the  general  case  to  the  special  case  discussed  in  the  last 
section. 

In  what  follows,  the  notion  of  a  “Harris  chain”  will  play  an  important  role.  For  a 
fixed  set  V  e  93  x,  define  the  random  variable 

ry(x)  =  min [k  >  1  :  Xk(x)  G  V], 

the  time  of  the  first  hitting  of  V  by  the  chain  starting  from  the  state  v  (we  put 
Ty(x)  =  00  if  all  Xk(x)  £  V). 
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Definition  13.7.1  A  Markov  chain  X  =  {Xn}  in  (X,  is  said  to  be  a  Harris 
chain  (or  Harris  irreducible)  if  there  exists  a  set  V  G  *Bx>  a  probability  measure  fi 
on  (X,  23x)>  and  numbers  no  >  1,  p  g  (0,  1)  such  that 

(10)  P(ty  (jc)  <  oo )  =  1  for  all  xeX;  and 

(11)  P(x,  no,  B )  >  pfi(B)  for  all  x  G  V,  B  g  $3x- 


Condition  (Io)  plays  the  role  of  an  irreducibility  condition:  starting  from  any 
point  v  G  X,  the  trajectory  of  Xn  will  sooner  or  later  visit  the  set  V.  Condition  (II) 
guarantees  that,  after  no  steps  since  hitting  V,  the  distribution  of  the  walking  particle 
will  be  minorised  by  a  common  “distribution”  pfi(-).  This  condition  is  sometimes 
called  a  “mixing  condition”;  it  ensures  a  “partial  loss  of  memory”  about  the  trajec¬ 
tory’s  past.  This  is  not  the  case  for  the  chain  from  Example  13.6.1  for  which  con¬ 
dition  (II)  does  not  hold  for  any  V,  p  or  no  (P  (x,  •)  form  a  collection  of  mutually 
singular  distributions  which  are  singular  with  respect  to  Lebesgue  measure). 

If  a  chain  has  an  atom  xo,  then  conditions  (Io)  and  (II)  are  always  satisfied  for 
V  =  {vo},  no  =  1,  p  =  1,  and  /*,(•)  =  P(x o,  •)>  so  that  such  a  chain  is  a  Harris  chain. 

The  set  V  is  usually  chosen  to  be  a  “compact”  set  (if  X  =  Rk,  it  will  be  a  bounded 
set),  for  otherwise  one  cannot,  as  a  rule,  obtain  inequalities  in  (II).  If  the  space  X 
is  “compact”  itself  (a  finite  or  bounded  subset  of  Rk),  condition  (II)  can  be  met 
for  V  =  X  (condition  (Io)  then  always  holds).  For  example,  if  {Xn}  is  a  finite,  ir¬ 
reducible  and  aperiodic  chain,  then  by  Theorem  13.4.2  there  exists  an  no  such  that 
P  (/,  no,  j)  >  p  >  0  for  all  i  and  j.  Therefore  condition  (II)  holds  for  V  =  X  if  one 
takes  fi  to  be  a  uniform  distribution  on  X. 

One  could  interpret  condition  (II)  as  that  of  the  presence,  in  all  distributions 
P  (x,  no,-)  for  v  g  V,  of  a  component  which  is  absolutely  continuous  with  respect 
to  the  measure  /i: 


.  r  p  (x,no,dy) 

inf - 

/x(dy) 


>  p  >  0. 


We  will  also  need  a  condition  of  “positivity”  (positive  recurrence)  of  the  set  V 
(or  that  of  “positivity”  of  the  chain): 


(I)  supv€y  E Ty(x)  <  00, 

and  the  aperiodicity  condition  which  will  be  written  in  the  following  form.  Let 
Xk(fi)  be  a  Markov  chain  with  an  initial  value  Xo  fi,  where  p,  is  from  condi¬ 
tion  (II).  Put 

r v(p)  •=  minjk  >:  Xk(p)eV}. 

It  is  evident  that  r v(p)  is,  by  virtue  of  (Io),  a  proper  random  variable.  Denote  by 
n\,  172,  •  •  •  the  possible  values  of  r y(p),  i.e.  the  values  for  which 

P (rv(fi)  =  njc)  >0,  k  =  1, 2, . . . . 

Then  the  aperiodicity  condition  will  have  the  following  form. 
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(III)  There  exists  a  k  >  1  such  that 

g.c.d.{no  +  n\,  no  + 112, . .  • ,  no  +  nj,}  =  1, 
where  no  is  from  condition  (II). 

Condition  (III)  is  always  satisfied  if  (II)  holds  for  no  =  1  and  fi(V)  >  0  (then 
m  =  0,  no  +  n\  —  1). 

Verifying  condition  (I)  usually  requires  deriving  bounds  for  Ety(x)  for  x  ^  V 
which  would  automatically  imply  (Iq)  (see  the  examples  below). 


Theorem  13.7.1  Suppose  conditions  (Io),  (I),  (II)  and  (III)  are  satisfied  for  a 
Markov  chain  A ,  i.e.  the  chain  is  an  aperiodic  positive  Harris  chain.  Then  there 
exists  a  unique  invariant  distribution  n  such  that,  for  any  initial  distribution  /Lq,  as 
n  —>  oo, 


PQio,n> ') 


(13.7.1) 


The  proof  is  based  on  the  use  of  the  above-mentioned  construction  of  an  “arti¬ 
ficial  atom”  and  reduction  of  the  problem  to  Theorem  13.6.1.  This  allows  one  to 
obtain,  in  the  course  of  the  proof,  a  representation  for  the  invariant  measure  n  simi¬ 
lar  to  (13.6.15)  (see  (13.7.5)). 

A  remarkable  fact  is  that  the  conditions  of  Theorem  13.7.1  are  necessary  for 
convergence  (13.7.1)  (for  more  details,  see  [6]). 


Proof  of  Theorem  13.7.1  For  simplicity’s  sake,  assume  that  no  =  1.  First  we  will 
construct  an  “extended”  Markov  chain  A*  =  {A*}  =  {Xn,  co(n)},  co(n)  being  a  se¬ 
quence  of  independent  identically  distributed  random  variables  with 

P  (co(n)  =  l)  =  p,  P  [co{n)  =  0)  =  1  —  p. 

The  joint  distribution  of  (A (n),  co(n))  in  the  state  space 

X*  :=  X  X  {0, 1}  =  {x*  =  (x,  5) :  x  e  X;  S  =  0, 1 } 

and  the  transition  function  P*  of  the  chain  A*  are  defined  as  follows  (the  notation 
A*(x*)  has  the  same  meaning  as  Xn(x)): 

P(X*(x*)  €  ( B ,  5))  =:  P*( x*,  (B,  5))  =  P(x,  B)  P(tu(l)  =  5 )  for  x  i  V 

r 

(i.e.,  for  Xn  V,  the  components  of  A*+1  are  “chosen  at  random”  indepen¬ 
dently  with  the  respective  marginal  distributions).  But  if  x  e  V,  the  distribution  of 
A*(x*,  1)  is  given  by 


P(Xj  ((x,  1)  e  (B,  8 ))  =  P*((x,  1),  (B,  8))  =  li{B)P(co{l)  =  8 ), 
P(X*((x,  0)  e  (B,  8))  =  P*((x,  0),  ( B ,  5))  =  Q(x,  B ) P(a>(l)  =  5), 
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where 

Q(x,  B)  :=  (P(x,  B)  -  PtL(B))/(  1  -  p), 
so  that,  for  any  B  e  03  x, 

p li(B)  +  (1  -  p)Q(x,  B)  =  P(x,  B).  (13.7.2) 

Thus  P (co(n  +  1)  =  1|  A*)  =  p  for  any  values  of  X*.  However,  when  “choosing” 
the  value  Xn+\  there  occurs  (only  when  Xn  e  V)  a  partial  randomisation  (or  split¬ 
ting):  for  Xn  e  V ,  we  let  P(Xn+i  e  B  \  X*)  be  equal  to  the  value  fi(B)  (not  depend¬ 
ing  on  Xn  g  VI)  provided  that  co  in)  =  1 .  If  co  (n)  =  0,  then  the  value  of  the  probabil¬ 
ity  is  taken  to  be  Q(Xn ,  B) .  It  is  evident  that,  by  virtue  of  condition  (II)  (for  no  =  1), 
fi(B)  and  Q(x ,  B)  are  probability  distributions,  and  by  equality  (13.7.2)  the  first 
component  Xn  of  the  process  X*  has  the  property  P(Xn+\  e  B  \Xn)  =  P(Xn,  B ), 
and  therefore  the  distributions  of  the  sequences  X  and  X  coincide. 

As  we  have  already  noted,  the  “extended”  process  X* (n)  possesses  the  fol¬ 
lowing  property:  the  conditional  distribution  P(A*+1  e  (B,8)\X*)  does  not  de¬ 
pend  on  X*(n)  on  the  set  X*  e  V *  :=  (V,  1)  and  is  there  the  known  distribution 
fi(B)P(co(l)  =  8).  This  just  means  that  visits  of  the  chain  A*  to  the  set  V *  divide 
the  trajectory  of  A*  into  independent  cycles,  in  the  same  way  as  it  happens  in  the 
presence  of  a  positive  atom. 

We  described  above  how  one  constructs  the  distribution  of  A*  from  that  of  A. 
Now  we  will  give  obvious  relations  reconstructing  the  distribution  of  A  from  that 
of  the  chain  A*: 

P(X„(x)  e  B)  =  pP(X*((x,  1)  e  B*)  +  (1  -  p) P(X*(*,0)  e  B*),  (13.7.3) 

where  5*  :=  (B,  0)  U  (B,  1).  Note  also  that,  if  we  consider  Xn  =  Xn  as  a  component 
of  A*,  we  need  to  write  it  as  a  function  Xn(x*)  of  the  initial  value  v*  e  X*. 

Put 

t*  :=  min{^  >  1  :  X*k  (x*)  ef),  =  ( V ,  1). 

It  is  obvious  that  r*  does  not  depend  on  the  value  v*  =  (jc,  1),  since  X\(x*)  has 
the  distribution  fi  for  any  x  G  V.  This  property  allows  one  to  identify  the  set  V* 
with  a  single  point.  In  other  words,  one  needs  to  consider  one  more  state  space  X** 
which  is  obtained  from  X*  if  we  replace  the  set  V*  =  (V,  1)  by  a  point  to  be  denoted 
by  vo.  In  the  new  state  space,  we  construct  a  chain  A**  equivalent  to  A*  using  the 
obvious  relations  for  the  transition  probability  P**: 

P** (x* ,  (B ,  8))  ■■=  P*(x*,(B,S))  for  x*  (V,  1)  =  V*, 

P**{x o,  (B,8))  :=  pfL(B),  P**(x*,x 0)  :=  P*(x*,  V*)- 

Thus  we  have  constructed  a  chain  A**  with  the  transition  function  P**,  and  this 
chain  has  atom  jto.  Clearly,  r*  =  min{k  >  1  :  A^*(vo)  =  vo}.  We  now  prove  that  this 
atom  is  positive.  Put 

E  :=  sup  Ery  (x). 
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Lemma  13.7.1  Er*  <  -  E. 

—  p 


Proof  Consider  the  evolution  of  the  first  component  Xk(x*)  of  the  process 
x *  e  V*.  Partition  the  time  axis  k  >  0  into  intervals  by  hitting  the  set  V  by  Xk(x*). 

Let  ri  >  1  be  the  first  such  hitting  time  (recall  that  X\ (x*)  =  Xo(/i)  has  the  dis¬ 
tribution  /i,  so  that  ri  =  1  if  /i(V)  =  1).  Prior  to  time  z\  (in  the  case  z\  >  1) 
transitions  of  Xfc(x*),  k  >  2,  were  governed  by  the  transition  function  P(y,  B), 
y  e  Vc  =  X  \  V.  At  time  z\,  according  to  the  definition  of  X *,  one  carries  out  a 
Bernoulli  trial  independent  of  the  past  history  of  the  process  with  success  (which 
is  the  event  co(z\)  =  1)  probability  p.  If  co(z\)  =  1  then  r*  =  z\.  If  co(z\)  =  0  then 
the  transition  from  XTl(x*)  to  XTl+i(x*)  is  governed  by  the  transition  function 
Q(y,  B)  =  (P(y,  B)  —  p/i(B))/(  1  —  p ),  y  e  V.  The  further  evolution  of  the  chain 
is  similar:  if  z\  +  Z2  is  the  time  of  the  second  visit  of  X  (x*,  k)  to  V  (in  the  case 
co{z\)  =  0)  then  in  the  time  interval  [z\  +  1,  zf\  transitions  of  X (x* ,  k)  occur  accord¬ 
ing  to  the  transition  function  P(y,B),yeVc.  At  time  z\  +  Z2  one  carries  out  a  new 
Bernoulli  trial  with  the  outcome  co(z\  +  T2).  If  co(z\  +  Z2)  =  1,  then  r*  =  z\  +  Z2. 
If  co(z\  +  Z2)  =  0,  then  the  transition  from  X(x*,  z  1  +  zf)  to  X(x*,  z\  +  T2  +  1)  is 
governed  by  Q(y,  B),  and  so  on. 

In  other  words,  the  evolution  of  the  component  X^ (x*)  of  the  process  X|(x*)  is 
as  follows.  Let  X  =  {Xk},  k  =  1, 2, . . . ,  be  a  Markov  chain  with  the  distribution  fi 
at  time  k—  1  and  transition  probability  Q(x,  B)  at  times  k  >  2, 


Q(x,B) 


(P(x,B)-Pfi(B))/(l-p)  ifxeV, 
P(x,B )  ifx  G  Vc. 


Define  7}  as  follows: 


To  :=  0,  T\  =  Z\  —  min{k  >  1  :  Xk  e  V}, 
f  :=  z\  H - 1-  Z[  =  min{k  >  7}_i  :  Xk  e  V},  i  >  2. 


Let,  further,  v  be  a  random  variable  independent  of  X  and  having  the  geometric 
distribution 

P(v  =  jt)  =  (l  -p)k~xp,  k>  1,  v  =  min{fe>l:<w(7)t)  =  l}.  (13.7.4) 

Then  it  follows  from  the  aforesaid  that  the  distribution  of  X\{x Xr*(x*)  co- 
incides  with  that  of  X\ , . . . ,  Xv  \  in  particular,  r*  =  Tv ,  and 

oo 

Er*  =  ^  p(\  -  p)k~lETk. 

k=  1 

Further,  since  /*,(/?)  <  P(x,  5)//?  for  x  g  V,  then,  for  any  x  eV, 

Eti  =  fi(V)  +  /  /*,(<iw)(l  +  Ery(w)) 
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1 

<  — 

PI 


P(x,V)+  /  P(x,  du){\  +  Ery  (w)) 

Jvc 


Er y(x)  <  E 


P 


P 


To  bound  Er/  for  i  >  2,  we  note  that  Q(x,  B)  <  (1  —  p )  ^(x,  5)  for  x  e  V. 
Therefore,  if  we  denote  by  T(/)  the  a -algebra  generated  by  {A^,  co(xk)}  for  k  <7/, 
then 


E(r/|T(/_i))  <  sup  Q(x,V)+f  Q(x,du)(l+Erv(u)) 

xeVL  Jvc 


< 


1 

- sup 

1  —  P  xeVL 


P(x,  V)  +  /  P(x,  <iw)(l  +  Ery  (w)) 

Jvc 


=  (l  —  p)  1  sup  Ery  (x)  =  E(1  —  /?)  1 


This  implies  the  inequality  E7^  <  +  1)/(1  —/?)),  from  which  we  obtain 

that 


oo 


Er*  <  E I  1  /p  +  p  £](£  -  1)(1  -  p)k~2  =  2 E/p. 


k=  1 


The  lemma  is  proved. 


□ 


We  return  to  the  proof  of  the  theorem.  To  make  use  of  Theorem  13.6.1,  we  now 
have  to  show  that  P(r*(x*)  <  oo)  =  1  for  any  x*  e  X*,  where 

r*(x*)  :=min {k  >  l:X*k(x*)  e  V*}. 

But  the  chain  X  visits  V  with  probability  1.  After  v  visits  to  V  (v  was  defined  in 
(13.7.4)),  the  process  A*  =  ( X(n ),  co(n))  will  be  in  the  set  V*. 

The  aperiodicity  condition  for  no  =  1  will  be  met  if  /jl(V)  >  0.  In  that  case  we 
obtain  by  virtue  of  Theorem  13.6.1  that  there  exists  a  unique  invariant  measure  7r* 
such  that,  for  any  x*  e  X*, 


j  oo 

•)-**(•)!  -+0,  7C*((B,S))  =  —  £pf.(k,(B,d)), 

X  k=  1 

p;*(k,  (. B ,  5))  =  P(X|(x*)  e  (B,  8),  X^(x*)  i  V*, . . . ,  X^*_,(x*)  i  V*). 

(13.7.5) 


In  the  last  equality,  we  can  take  any  point  x*  e  V*;  the  probability  does  not  depend 
on  the  choice  of  x*  e  V*. 

From  this  and  the  “inversion  formula”  (13.7.3)  we  obtain  assertion  (13.7.1)  and 
a  representation  for  the  invariant  measure  it  of  the  process  X. 

The  proof  of  the  convergence  \\P(/jlq,  n,  •)  —  7T (*)  ||  — >►  0  and  uniqueness  of  the 
invariant  measure  is  exactly  the  same  as  in  Theorem  13.6.1  (these  facts  also  follow 
from  the  respective  assertions  for  A*). 


13.7  Ergodicity  of  Harris  Markov  Chains 


429 


Verifying  the  conditions  of  Theorem  13.6.1  in  the  case  where  no  >  1  or  IL(V)=  0 
causes  no  additional  difficulties  and  we  leave  it  to  the  reader. 

The  theorem  is  proved.  □ 

Note  that  in  a  way  similar  to  that  in  the  proof  of  Theorem  13.4.1,  one  could  also 
establish  the  uniqueness  of  the  solution  to  the  integral  equation  for  the  invariant 
measure  (see  Definition  13.6.6)  in  a  wider  class  of  signed  finite  measures. 

The  main  and  most  difficult  to  verify  conditions  of  Theorem  13.7.1  are  undoubt¬ 
edly  conditions  (I)  and  (II).  Condition  (Io)  is  usually  obtained  “automatically”,  in 
the  course  of  verifying  condition  (I),  for  the  latter  requires  bounding  Er y(v)  for 
all  v.  Verifying  the  aperiodicity  condition  (III)  usually  causes  no  difficulties.  If,  say, 
recurrence  to  the  set  V  is  possible  in  m\  and  m2  steps  and  g.c.d.  (mi,  m2)  =  1,  then 
the  chain  is  aperiodic. 


13.7.2  On  Conditions  (I)  and  (II) 

Now  we  consider  in  more  detail  the  main  conditions  (I)  and  (II).  Condition  (II)  is 
expressed  directly  in  terms  of  local  characteristics  of  the  chain  (transition  probabili¬ 
ties  in  one  or  a  fixed  number  of  steps  no  >  1),  and  in  this  sense  it  could  be  treated  as 
a  “final”  one.  One  only  needs  to  “guess”  the  most  appropriate  set  V  and  measure  p 
(of  course,  if  there  are  any).  For  example,  for  multi-dimensional  Markov  chains  in 
X  =  R^,  condition  (II)  will  be  satisfied  if  at  least  one  of  the  following  two  conditions 
is  met. 

(11^)  The  distribution  of  XnQ(x)  has,  for  some  no  and  N  >  0  and  all  x  E  Vjq  := 
{y  :  \y\  <  N},  a  component  which  is  absolutely  continuous  with  respect  to  Lebesgue 
measure  (or  to  the  sum  of  the  Lebesgue  measures  on  M.d  and  its  “coordinate  ”  sub¬ 
spaces)  and  is  “uniformly”  positive  on  the  set  Vm  for  some  M  >  0.  In  this  case,  one 
can  take  p,  to  be  the  uniform  distribution  on  Vm  • 

(II/)  X  =  Z>d  is  the  integer  lattice  in  Rd .  In  this  case  the  chain  is  countable  and 
everything  simplifies  (see  Sect.  13.4). 

We  have  already  noted  that,  in  the  cases  when  a  chain  has  a  positive  atom,  which 
is  the  case  in  Example  13.6.2,  no  assumptions  about  the  structure  (smoothness)  of 
the  distribution  of  Xno(x)  are  needed. 

The  “positivity”  condition  (I)  is  different.  It  is  given  in  terms  of  rather  compli¬ 
cated  characteristics  E xy(x)  requiring  additional  analysis  and  a  search  for  condi¬ 
tions  in  terms  of  local  characteristics  which  would  ensure  (I).  The  rest  of  the  section 
will  mostly  be  devoted  to  this  task. 

First  of  all,  we  will  give  an  “intermediate”  assertion  which  will  be  useful  for  the 
sequel.  We  have  already  made  use  of  such  an  assertion  in  Example  13.6.2. 

Theorem  13.7.2  Suppose  there  exists  a  nonnegative  measurable  function 
g  :  X  — >  R  such  that  the  following  conditions  (F)  are  met : 

(F) !  Er v(x)  <  c  1  +  C2g(x)  for  x  e  Vc  =  X  \  V,  ci ,  C2  =  const. 
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(F)2  supx€y  Eg(Xi(*))  <  00. 

Then  conditions  (Iq)  and  (I)  are  satisfied. 


The  function  g  from  Theorem  13.7.2  is  often  called  the  test ,  or  Lyapunov,  func¬ 
tion.  For  brevity’s  sake,  put  xy  (x)  :=  r(x). 

Proof  If  (l8)  holds  then,  for  x  e  V , 

Er(x)  <  1  +  E[t(Xi(x));  X\(x)  e  Vc ] 

<  l+E(E[r(Xi(x))|Xi(x)];X1(x)e  Vc) 

<  1  +  E(ci  +  C2g{Xi(x)y,  X\(x)  e  Vc) 

<  1  +  Cl  +  C2  SUpEg(Xi(x))  <  00. 


The  theorem  is  proved.  □ 

Note  that  condition  (F)2,  like  condition  (II),  refers  to  “local”  characteristics  of 
the  system,  and  in  that  sense  it  can  also  be  treated  as  a  “final”  condition  (up  to  the 
choice  of  function  g). 

We  now  consider  conditions  ensuring  (Ig)\.  The  processes 

{Xn)  =  {Xn(x)},  X0(x)=x, 

to  be  considered  below  (for  instance,  in  Theorem  13.7.3)  do  not  need  to  be  Marko¬ 
vian.  We  will  only  use  those  properties  of  the  processes  which  will  be  stated  in 
conditions  of  assertions. 

We  will  again  make  use  of  nonnegative  trial  functions  g  :  X  — >  R  and  consider  a 
set  V  “induced”  by  the  function  g  and  a  set  U  which  in  most  cases  will  be  a  bounded 
interval  of  the  real  line: 


V  :=  g~l(U)  =  {xeX:  g(x)  e  U). 

The  notation  r(x)  =  r u(x)  will  retain  its  meaning: 

x(x)  :=  min[k  >  1  :  g(X^(x))  e  U }  =  min[k  >  1  :  Xk(x)  e  V}. 

The  next  assertion  is  an  essential  element  of  Lyapunov’s  (or  the  test  functions) 
approach  to  the  proof  of  positive  recurrence  of  a  Markov  chain. 

Theorem  13.7.3  If{Xn}  is  a  Markov  chain  and,  for  x  e  Vc, 

Eg(Xi(x))  -g(x)  <  -s,  (13.7.6) 

then  Er(x)  <  g(x)/e  and  therefore  (F)i  holds. 


To  prove  the  theorem  we  need 
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Lemma  13.7.2  If,  for  some  s  >  0,  all  n  =  0,  1,  2, . . . ,  and  any  x  e  Vc, 

E(g(Xn+1)  -  g(Xn) \x(x)  >n)<  -s,  (13.7.7) 

then 

Er(x)<— ,  xeVc, 

8 

and  therefore  (F)i  holds. 

Proof  Put  r(x)  :=  r  for  brevity  and  set 

t(ao  :=  min(r,  N),  A(n)  :=  g(Xn+\)  -  g(Xn). 

We  have 


-g(x)  =  -Eg(X0)  <  E (g(xm)  -  g(X, 0)) 

T(N)- 1  N 

=  E  A(n)  =  EA(n)I  (r  >  n) 

77=0  77=0 

N  N 

=  l]p(r  >  n)E(A(n)\r  >  n)  <  -e^P(r  >  n). 

77=0  77=0 

This  implies  that,  for  any  N, 


N 

^^P(r  >  n)  < 

77=0 


gOO 

8 


Therefore  this  inequality  will  also  hold  for  N  =  oo,  so  that  E r  <  g(x)/s.  The  lemma 
is  proved.  □ 


Proof  of  Theorem  13. 7.3  The  proof  follows  in  an  obvious  way  from  the  fact  that,  by 
(13.7.6)  and  the  homogeneity  of  the  chain,  E(g(Xw+i)  —  g(Xn)  \  Xn)  <  —8  holds  on 
{Xn  e  Vc],  and  from  inclusion  {r  >  n]  c  {Xn  e  Vc },  so  that 

E(g(Xn+l)  -  g(Xn);  x  >  n)  =E[E(g(X„+i)  -  g{Xn)\Xn)\  r  >  n\  <  -eP(r  >  n). 

The  theorem  is  proved.  □ 


Theorem  13.7.3  is  a  modification  of  the  positive  recurrence  criterion  known  as 
the  Foster-Moustafa-Tweedy  criterion  (see,  e.g.,  [6,  27]). 

Consider  some  applications  of  the  obtained  results.  Let  X  be  a  Markov  chain  on 
the  real  half-axis  M+  =  [0,  oo).  For  brevity’s  sake,  put  §(jc)  :=  X\(x)  —  x.  This  is 
the  one-step  increment  of  the  chain  starting  at  the  point  v;  we  could  also  define  §(v) 
as  a  random  variable  with  the  distribution 


P(£Cx)  eB)  =  P(x,B-x)  (B  -  x  =  {y  eX:y  +x  e  B}). 
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Corollary  13.7.1  If,  for  some  N  >  0  and  s  >  0, 


sup  E£(x)  <  oo,  sup  E£(x)  <  —s, 


(13.7.8) 


x>N 


then  conditions  (Io)  and  (I)  hold  for  V  =  [0,  TV] . 

Proof  Make  use  of  Theorems  13.7.2,  13.7.3  and  Corollary  13.3.1  with  g(x)  =  x, 
V  =  [0,  N].  Conditions  (l8)2  and  (13.7.6)  are  clearly  satisfied.  □ 

Thus  the  presence  of  a  “negative  drift”  in  the  region  x  >  N  guarantees  positivity 
of  the  chain.  However,  that  condition  (I)  is  met  could  also  be  ensured  when  the 
“drift”  E£(x)  vanishes  as  x  — >  oo. 

Corollary  13.7.2  Let  supr  E§2(x)  <  oo  and 

E§2(x)  <  /3,  E£(x)  < - for  x  >  N. 

x 

If  2c  >  p  then  conditions  (Io)  and  (I)  hold  for  V  =  [0,  N]. 

Proof  We  again  make  use  of  Theorems  13.7.2  and  13.7.3,  but  with  g(x)  =  x2.  We 
have  for  x  >  N: 


Eg(Xi(x))  -  g(x)  =  E(2x£(x)  +  f(x))  <-2c  +  fi<0. 


□ 


Before  proceeding  to  examples  related  to  ergodicity  we  note  the  following.  The 
“larger”  the  set  V  the  easier  it  is  to  verify  condition  (I),  and  the  “smaller”  that  set, 
the  easier  it  is  to  verify  condition  (II).  In  this  connection  there  arises  the  question 
of  when  one  can  consider  two  sets:  a  “small”  set  W  and  a  “large”  set  V  D  W  such 
that  if  (I)  holds  for  V  and  (II)  holds  for  W  then  both  (I)  and  (II)  would  hold  for  W. 
Under  conditions  of  Sect.  13.6  one  can  take  W  to  be  a  “one-point”  atom  xo. 

Lemma  13.7.3  Let  sets  V  and  W  be  such  that  the  condition 

(Iy)  E  :=  sup  Ery (x)  <  oo 

holds  and  there  exists  an  m  such  that 


m 


inf  p (  [J  {XjW  e  W) 


xgV 


>  q  >  0. 


Then  the  following  condition  is  also  met : 


mE 

(Iw)  sup  Er w(x)  5  supEr^y(x)  < - 
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Thus,  under  the  assumptions  of  Lemma  13.7.3,  if  condition  (I)  holds  for  V  and 
condition  (II)  holds  for  W,  then  conditions  (I)  and  (II)  hold  for  W. 

To  prove  Lemma  13.7.3,  we  will  need  the  following  assertion  extending  (in  the 
form  of  an  inequality)  the  well-known  Wald  identity. 

Assume  we  are  given  a  sequence  of  nonnegative  random  variables  x\,  T2, . . . 
which  are  measurable  with  respect  to  a -algebras  ill  C  ii2  C  •  •  • ,  respectively,  and 

let  Tn  :=  T\  + - b  rn.  Furthermore,  let  v  be  a  given  stopping  time  with  respect  to 

{41^  }  •  IT  —  ft}  E  id yi  • 

Lemma  13.7.4  7fE(rn|iln_i)  5  a  then  E Tv  <  ^Ev. 

Proof  We  can  assume  without  loss  of  generality  that  Ev  <  oo  (otherwise  the  in¬ 
equality  is  trivial).  The  proof  essentially  repeats  that  of  Theorem  4.4.1.  One  has 

oo  oo 

Etv  =  y^E(7fc;  v  =  k)  =  y^Efa,  v  >  k).  (13.7.9) 

k= 1  k=\ 


Changing  the  summation  order  here  is  well-justified,  for  the  summands  are  nonneg¬ 
ative.  Further,  {v  <  k  —  1}  e  iik-i  and  hence  {v  >  k]  e  iik-i-  Therefore 

E {xk\  v  >  k)  =  EI(v  >  k)E(r^|il^_i)  <  aP(v  >  k). 

Comparing  this  with  (13.7.9)  we  get 


oo 

E  Tv<a  E  P(v  >  k)  =  aEnu. 

k= 1 


The  lemma  is  proved. 


□ 


Proof  of  Lemma  13.7.3  Suppose  the  chain  starts  at  a  point  x  e  V.  Consider  the 
times  T\  ,72,...  of  successive  visits  of  X  to  V,  Tfi  =  0.  Put  Fo  •=  x,  Yk  :=  Xjk  (x), 
k  =  1,2,....  Then,  by  virtue  of  the  strong  Markov  property,  the  sequence  (Yk,  Tk) 
will  form  a  Markov  chain.  Set  :=  cr(T\, . . . ,  Tk',  Y\, . . . ,  Yk),  Xk  •=  Tk  —  Tk- 1, 

k  =  1,2 _ Then  v  :=  min{k  :  Yk  e  W]  is  a  stopping  time  with  respect  to  {1 4}.  It  is 

evident  that  E(r^|it^_i)  <  E.  Bound  Ev.  We  have 


/  T'km 

Pk  :=  P(v  >  km)  <  P|(V  i  W} 


(k  —  I  )m 


Tk, 


m 


=ei(  n  {Xj  i  w}  Ie(iI  P|  {Xj  i  w} 

;  =  1  '  '  '  j  =  T(k—  l)m+l  ' 


434 


13  Sequences  of  Dependent  Trials.  Markov  Chains 


Since  >  i.  the  last  factor,  by  the  assumptions  of  the  lemma  and  the  strong 
Markov  property,  does  not  exceed 

P(  fl  {yleW(X^-nJ  i  W) J  <  (1  -  9). 

where,  as  before,  X]Jew(v)  is  a  chain  with  the  same  distribution  as  Xkix)  but  in¬ 
dependent  of  the  latter  chain.  Thus  pk  <  (1  —  q)Pk- 1  <  (1  —  q)k,  Ev  <  m/q ,  and 
by  Lemma  13.7.4  we  have  E Tv  <  Em/q.  It  remains  to  notice  that  Twix)  =  Tv.  The 
lemma  is  proved.  □ 

Example  13. 7.1  A  random  walk  with  reflection.  Let  ,  §2, . . .  be  independent  iden¬ 
tically  distributed  random  variables, 


Xn+i  •—  \Xn  +  ^n+\ l>  n  —  0,1,....  (13.7.10) 

If  the  ^k  and  hence  the  Xk  are  non-arithmetic,  then  the  chain  X  has,  generally 
speaking,  no  atoms.  If,  for  instance,  ^  have  a  density  fit )  with  respect  to  Lebesgue 
measure  then  P(Xk(x)  =  y)  =  0  for  any  x,y,k  >  1.  We  will  assume  that  a  broader 
condition  (A)  holds: 

(A).  In  the  decomposition 

P (&  <0  =  Pa  Fa(t )  +  Pc  Fflt) 

of  the  distribution  of  ^  into  the  absolutely  continuous  iEa)  and  singular  (Fc)  (in- 
cluding  discrete)  components,  one  has  pa  >  0. 

Corollary  13.7.3  If  condition  (A)  holds ,  <2  =  E^  <  0,  and  E|£fc|  <  oo,  then  the 
Markov  chain  defined  in  (13.7.10)  satisfies  the  conditions  of  Theorem  13.7.2  and 
therefore  is  ergodic  in  the  sense  of  convergence  in  total  variation. 

Proof  We  first  verify  that  the  chain  satisfies  the  conditions  of  Corollary  13.7.1. 
Since  in  our  case  |Xi(jc)  —  x\  <  |§i|,  the  first  of  conditions  (13.7.8)  is  satisfied. 
Further, 

E£(*)  =  E\x  +£i|  -v  =E(£i;£i  >  -x)  -E(2x  +  £i;£i  <  -*)  ->  E£i 


as  v  — >►  oo,  since 

xP(^i  <  — x)  <  E(|£i|,  |^i |  >  x)  — >•  0. 

Hence  there  exists  an  N  such  that  E§(jc)  <  a/2  <  0  for  v  >  A.  This  proves  that 
conditions  (Io)  and  (I)  hold  for  V  =  [0,  N]. 

Now  verify  that  condition  (II)  holds  for  the  set  W  =  [0,  h]  with  some  h.  Let  fit) 
be  the  density  of  the  distribution  Fa  from  condition  (A).  There  exist  an  /o  >  0  and 
a  segment  \t\,  tf\,  t2  >  t\ ,  such  that  fit)  >  /o  for  t  e  [t\,  tf\.  The  density  of  v  + 
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will  clearly  be  greater  than  /o  on  [x  +  t\,  x  +  Put  h  :=  (t^  —  t\)/2.  Then,  for 
0  <  x  <  h,  one  will  have  \t2  —  h,  ^1  C  [x  +  t\ ,  x  + 

Suppose  first  that  >  0.  The  aforesaid  will  then  mean  that  the  density  of  x  + 
will  be  greater  than  /o  on  [fe  —  /z)+>  t^\  for  all  x  <h  and,  therefore, 

inf  P(XiOc)  e  B)  >  px  f  fo(t)dt , 

JB 


where 


i ft  €  |(?2  -/i)+,f2], 

otherwise. 


This  means  that  condition  (II)  is  satisfied  on  the  set  W  =  [0,  h].  The  case  t2  <  0  can 
be  considered  in  a  similar  way. 

It  remains  to  make  use  of  Lemma  13.7.3  which  implies  that  condition  (I)  will 
hold  for  the  set  W.  The  condition  of  Lemma  13.7.3  is  clearly  satisfied  (for  suffi¬ 
ciently  large  m,  the  distribution  of  Xm(x),  x  <  N,  will  have  an  absolutely  continu¬ 
ous  component  which  is  positive  on  IT).  For  the  same  reason,  the  chain  X  cannot  be 
periodic.  Thus  all  conditions  of  Theorem  13.7.2  are  met.  The  corollary  is  proved.  □ 


Example  13.7.2  An  oscillating  random  walk.  Suppose  we  are  given  two  indepen¬ 
dent  sequences  §i,  §2>  ■  ■  ■  and  rj\,  r)2,  •  •  •  of  independent  random  variables,  identi¬ 
cally  distributed  in  each  of  the  sequences.  Put 


1  ^ — 


Xn  T  ^n-\- 1 
Xfi  T  ^in+ 1 


if  Xn  >  0, 
if  Xn  <  0. 


(13.7.11) 


Such  a  random  walk  is  called  oscillating.  It  clearly  forms  a  Markov  chain  in  the 
state  space  X  =  (— oo,  oo). 


Corollary  13.7.4  If  at  least  one  of  the  distributions  of  £&  or  rjk  satisfies  condition 
(A)  and  — oo  <  <  0,  oo  >  E rjk  >  0,  then  the  chain  (13.7.11)  will  satisfy  the 

conditions  of  Theorem  13.7.2  and  therefore  will  be  ergodic. 


Proof  The  argument  is  quite  similar  to  the  proof  of  Corollary  13.7.3.  One  just  needs 
to  take,  in  order  to  verify  condition  (I),  g(v)  =  |jc|  and  V  =  [-N,  N].  After  that  it 
remains  to  make  use  of  Lemma  13.7.3  with  W  =  [0,  h]  if  condition  (A)  is  satisfied 
for  §£  (and  with  W  =  [—h,  0)  if  it  is  met  for  rjk).  □ 


Note  that  condition  (A)  in  Examples  13.7.1  and  13.7.2  can  be  relaxed  to  that  of 
the  existence  of  an  absolutely  continuous  component  for  the  distribution  of  the  sum 
2^7=1  (or  £7=1  f°r  some  m-  On  the  other  hand,  if  the  distributions  of  these 
sums  are  singular  for  all  m,  then  convergence  of  distributions  P(x,  n,  •)  in  total  vari¬ 
ation  cannot  take  place.  If,  for  instance,  one  has  P(^  =  —  y/2)  =  P (§&  =  1)  =  1/2  in 
Example  13.7.1,  then  E ^  <  0  and  condition  (I)  will  be  met,  while  condition  (II)  will 
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not.  Convergence  of  Pix,  n,  •)  in  total  variation  to  the  limiting  distribution  jt  is  also 
impossible.  Indeed,  it  follows  from  the  equation  for  the  invariant  distribution  jt  that 
this  distribution  is  necessarily  continuous.  On  the  other  hand,  say,  the  distributions 
P(0,  n,  •)  are  concentrated  on  the  countable  set  N  of  the  numbers  |  —  k\fl  +  /  |; 

k,  l  =  1,2, _ Therefore  P(0,  n,  N)  =  1  for  all  n ,  jt(N)  =  0.  Hence  only  weak 

convergence  of  the  distributions  P(v,  n,  •)  to  7r(-)  may  take  place.  And  although  this 
convergence  does  not  raise  any  doubts,  we  know  no  reasonably  simple  proof  of  this 
fact. 

Example  13.7.3  (continuation  of  Examples  13.4.2  and  13.6.1)  Let  X  =  [0,  1], 
£i,§2,  ...  be  independent  and  identically  distributed,  and  Xn+\  :=  Xn  + 

(mod  1)  or,  which  is  the  same,  :=  {Xn  +  §w+i},  where  {x}  denotes  the  frac¬ 
tional  part  of  v.  Here,  condition  (I)  is  clearly  met  for  V  =  X  =  [0,  1].  If  the  ^  satisfy 
condition  (A)  then,  as  was  the  case  in  Example  13.7.1,  condition  (II)  will  be  met  for 
the  setW  =  [0,h]  with  some  h>  0,  which,  together  with  Lemma  13.7.3,  will  mean, 
as  before,  that  the  conditions  of  Theorem  13.7.2  are  satisfied.  The  invariant  distri¬ 
bution  jt  will  in  this  example  be  uniform  on  [0,  1].  For  simplicity’s  sake,  we  can 
assume  that  the  distribution  of  ^  has  a  density  fit),  and  without  loss  of  generality 
we  can  suppose  that  e  [0,  1]  if  it)  =  0  for  t  £  [0,  1]).  Then  the  density  pix)  =  1 
of  the  invariant  measure  tx  will  satisfy  the  equation  for  the  invariant  measure: 

dy  fix  ~y)+  dyf(x-y  +  1)  = 

J  X 

Since  the  stationary  distribution  is  unique,  one  has  n  =  C/o,i-  Moreover,  by  The¬ 
orem  A3.4.1  of  Appendix  3,  along  with  convergence  of  Pix,  n,  •)  to  Uo,i  in  total 
variation,  convergence  of  the  densities  Pix,n,  dt)/dt  to  1  in  (Lebesgue)  measure 
will  take  place. 

The  fact  that  the  invariant  distribution  is  uniform  remains  true  for  arbitrary 
non-lattice  distributions  of  However,  as  we  have  already  mentioned  in  Exam¬ 
ple  13.6.1,  in  the  general  case  (without  condition  (A))  only  weak  convergence  of 
the  distributions  Pix,  n,  •)  to  the  uniform  distribution  is  possible  (see  [6,  18]). 


[  f(y)dy. 
Jo 


pix)  =  1  =  f 
Jo 


13.8  Laws  of  Large  Numbers  and  the  Central  Limit  Theorem  for 
Sums  of  Random  Variables  Defined  on  a  Markov  Chain 

13.8.1  Random  Variables  Defined  on  a  Markov  Chain 

Let,  as  before,  X  =  {Xn}  be  a  Markov  Chain  in  an  arbitrary  measurable  state  space 
iX,  ?&x)  defined  in  Sect.  13.6,  and  let  a  measurable  function  /:  X  — >  R  be  given 
on  (X,*&x)-  The  sequence  of  sums 


Sn  :=J2fixk) 
k=  1 


(13.8.1) 
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is  a  generalisation  of  the  random  walks  that  were  studied  in  Chaps.  8  and  11.  One 
can  consider  an  even  more  general  problem  on  the  behaviour  of  sums  of  random 
variables  defined  on  a  Markov  chain.  Namely,  we  will  assume  that  a  collection 

of  distributions  {Fx}  is  given  which  depend  on  the  parameter  v  e  X.  If  Fx~l\t) 

is  the  quantile  transform  of  Fx  and  co  ^  Uo,i,  then  %x  :=  Fx~l\co)  will  have  the 
distribution  Fx  (see  Sect.  3.2.4). 

The  mapping  Fx  of  the  space  X  into  the  set  of  distributions  is  assumed  to  be  such 

that  the  function  £ x  (t)  =  Fx~  ^  (t)  is  measurable  on  X  x  R  with  respect  to  33 x  x  ® » 
where  33  is  the  a -algebra  of  Borel  sets  on  the  real  line.  In  this  case,  § x(co )  will  be  a 
random  variable  such  that  the  moments 

/oo  n  1 

vsdFJv)=  /  [F^x\u)]sdu 
-oo  J  0 

are  measurable  with  respect  to  33#  (and  hence  will  be  random  variables  themselves 
if  we  set  a  distribution  on  (X,  33#)). 

Definition  13.8.1  If  cot  €=  Uo,i  are  independent  then  the  sequence 

Hxn  :=  F(£nX\con),  n  =  0,1,..., 

is  called  a  sequence  of  random  variables  defined  on  the  Markov  chain  {Xn}. 

The  basic  objects  of  study  in  this  section  are  the  asymptotic  properties  of  the 
distributions  of  the  sums 

n 

S„:=J2txt.  (13.8.2) 

k= 0 

If  the  distribution  Fx  is  degenerate  and  concentrated  at  the  point  f(x)  then 
(13.8.2)  turns  into  the  sum  (13.8.1).  If  the  chain  X  is  countable  with  states 
E0,  F i, . . .  and  f(x)  =  I (Ej)  then  Sn  =  mj(n)  is  the  number  of  visits  to  the  state 
Ej  by  the  time  n  considered  in  Theorem  13.4.4. 


13.8.2  Laws  of  Large  Numbers 

In  this  and  the  next  subsection  we  will  confine  ourselves  to  Markov  chains  satis¬ 
fying  the  ergodicity  conditions  from  Sects.  13.6  and  13.7.  As  was  already  noticed, 
ergodicity  conditions  for  Harris  chains  mean,  in  essence,  the  existence  of  a  positive 
atom  (possibly  in  the  extended  state  space).  Therefore,  for  the  sake  of  simplicity,  we 
will  assume  from  the  outset  that  the  chain  X  has  a  positive  atom  at  a  point  vo  and 
put,  as  before, 


t(jc)  :=  minjk  >  0  :  Xf{x)  =  vo},  t(jco)  =  t. 

Summing  up  the  conditions  sufficient  for  (Io)  and  (I)  to  hold  (the  finiteness  of  r(x) 
and  Er)  studied  in  Sect.  13.7,  we  obtain  the  following  assertion  in  our  case. 
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Corollary  13.8.1  Let  there  exist  a  set  V  e  03#  such  that ,  for  the  stopping  time 
x y(x)  :=  min{^  :  Xk(x)  e  Vj,  we  have 

E  :=  sup  Ery  (v)  <  oo.  (13.8.3) 

igV 


Furthermore ,  let  there  exist  an  m  >  1  such  that 


inf  P 


m 


U  {w 

7  =  1 


>  q  >  0. 


Then 


Er  < 


mE 

q 


This  assertion  follows  from  Lemma  13.7.2.  One  can  justify  conditions  (Io)  and 
(13.8.3)  by  the  following  assertion. 


Corollary  13.8.2  Let  there  exist  an  s  >  0  and  a  nonnegative  measurable  function 
g  :  X  ->  R  such  that 

supEg(Xi(x))  <  OO 
igV 

and,  for  x  e  Vc, 

Eg(Xi(x))  -g(x)  <  -£. 

Then  conditions  (Iq)  and  (13.8.3)  are  met. 


In  order  to  formulate  and  prove  the  law  of  large  numbers  for  the  sums  (13.8.2),  we 
will  use  the  notion  of  the  increment  of  the  sums  (13.8.2)  on  a  cycle  between  conse¬ 
quent  visits  of  the  chain  to  the  atom  xq.  Divide  the  trajectory  Xq,  X\,  X2, . . . ,  Xn  of 
the  chain  X  on  the  time  interval  [0,  n]  into  segments  of  lengths  r\  :=  t(x),  T2,  T3, . . . 

(t j  =  x  for  j  >  2)  corresponding  to  the  visits  of  the  chain  to  the  atom  *0.  Denote 
the  increment  of  the  sum  Sn  on  the  k-th  cycle  (on  (7&_ 1,  7^])  by 


Cl  :=5>„ 

7=0 

Tk  k 

Kk  :=  ^2  k>2,  where  Tk  :=  ^r,-,  k  >  1,  7b  =  0. 

J=7t-i  +  l  i  =  l 

(13.8.4) 


The  vectors  (rk,  &),  k  >  2,  are  clearly  independent  and  identically  distributed.  For 

brevity,  the  index  k  will  sometimes  be  omitted:  (t&,  &)  =  (r,  £)  for  k  >  2. 

Now  we  can  state  the  law  of  large  numbers  for  the  sums  (13.8.2). 
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Theorem  13.8.1  Let  P(r(x)  <  oo)  =  1  for  all  x,  Er  <  oo,  E|C|  <  oo,  and  the  g.c.d. 
of  all  possible  values  ofr  equal  1.  Then 


n 


1  n 

k= 1 


EC 

Er 


as 


n 


oo. 


Proof  Put 


v(n)  :=  max{/:  :  Tk  <  n}. 


Then  the  sum  can  be  represented  as 


—  Cl  T  ^v(n)  “1“  Zni 


(13.8.5) 


where 

k  n 

Zk  :=  7!  Cy?  :=  • 

./=3  j= Tv(«)  +  1 

Since  ri  and  Cl  are  proper  random  variables,  we  have,  as  n  — >  oo, 

L  ^4  0.  (13.8.6) 

n 

The  sum  consists  of  y(w)  :=n  —  rV(w)  summands.  Theorem  10.3.1  implies  that 
the  distribution  of  y(w)  converges  to  a  proper  limiting  distribution,  and  the  same  is 
true  for  zn-  Hence,  as  n  — >  oo, 

—  4-0.  (13.8.7) 

n 

The  sums  ZV(W),  being  the  main  part  of  (13.8.5),  are  nothing  else  but  a  generalised 
renewal  process  corresponding  to  the  vectors  (r,  C)  (see  Sect.  10.6). 

Since  Er  <  oo,  by  Theorem  11.5.2,  as  ->  oo, 

Zy(n)  He 

n  Er 

Together  with  (13.8.6)  and  (13.8.7)  this  means  that 

Sn  EC 

n  Er 

The  theorem  is  proved. 

As  was  already  noted,  sufficient  conditions  for  P(r  (jc)  <  oo)  =  1  and  Er  <  oo  to 
hold  are  contained  in  Corollaries  13.8.1  and  13.8.2.  It  is  more  difficult  to  find  con¬ 
ditions  sufficient  for  EC  <  oo  that  would  be  adequate  for  the  nature  of  the  problem. 

Below  we  will  obtain  certain  relations  which  clarify,  to  some  extent,  the  con¬ 
nection  between  the  distributions  of  C  and  r  and  the  stationary  distribution  of  the 
chain  X. 


(13.8.8) 


(13.8.9) 

□ 
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Theorem  13.8.2  (A  generalisation  of  the  Wald  identity)  Assume  Er  <  oo,  the  g.c.d. 
of  all  possible  values  of  x  be  1,  n  be  the  stationary  distribution  of  the  chain  X ,  and 


E^E|§X 


J  E|§x  \ji  (dx)  <  oo. 


(13.8.10) 


Then 


Ef  =ErE3rE§Jc.  (13.8.11) 

The  value  of  E^E§X  is  the  “doubly  averaged”  value  of  the  random  variable 
over  the  distribution  ¥x  and  over  the  stationary  distribution  jt . 

Theorem  13.8.2  implies  that  the  condition  supr  E|§x|  <  oo  is  sufficient  for  the 
finiteness  of  E|f  |. 


Proof  [of  Theorem  13.8.2]  First  of  all,  we  show  that  condition  (13.8.10)  implies 
the  finiteness  of  E|f  |.  If  >0  then  Ef  is  always  well-defined.  If  we  assume  that 
Ef  =  oo  then,  repeating  the  proof  of  Theorem  13.8.1,  we  would  easily  obtain  that, 

p 

in  this  case,  Sn/n  ->  oo,  and  hence  necessarily  ¥Sn/n  ->  oo  as  w  — >  oo.  But 


n  n  „ 

E5„  =J2E^j  =J2  I  (E^)P  (Xj  e  dx), 
j= o  j=oJ 


where  the  distribution  P (Xj  e  •)  converges  in  total  variation  to  jt(-)  as  j  ->  oo, 


/ 


(E§’jc)P(2f  j  G  dx) 


j  (E '4x)x 


(dx), 


and  hence 

-  ESn  ->  E^E^  <  oo.  (13.8.12) 

n 

This  contradicts  the  above  assumption,  and  therefore  E£  <  oo.  Applying  the  above 
argument  to  the  random  variables  \%x  |,  we  conclude  that  condition  (13.8.10)  implies 
E|£|  <  oo. 

Let,  as  above,  rj(n)  \=  v(n)  -\-  \  =  min{&  :  Tk  >  n}.  We  will  need  the  following. 

Lemma  13.8.5  If  E|f|  <  oo  then 


E  Zn(n)=o(n).  (13.8.13) 

IfEC  <  oo  then 

E  tf(n)=o(n)  (13.8.14) 


as  n 


oo. 
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Proof  Without  losing  generality,  assume  that  %x  >0  and  £  >  0.  Since  xj  >  1,  we 
have 


k 

h(k)  :=  £>(7)  =k)<  1  for  all  X 
7=0 


Therefore, 


E(C?7(«)  >  v)  =  E  /z(k)P(C  >  v,x  >  n  —  k)  <  ^~^P(C  >  v,x  >  k). 

k= 0  £=0 


If  Ef  <oo  then 


n  nOO  n 

E ^(n)  <  /  P(C  >  v\x  >k)dv  =  ^~^E(£;  r  >  k),  (13.8.15) 

£=0  ^  0  &=0 

where  E(f ;  r  >  k)  — >►  0  as  k  — >  oo.  This  follows  from  Lemma  A3.2.3  of  Ap¬ 
pendix  3.  Together  with  (13.8.15)  this  proves  (13.8.13). 

Similarly,  for  Ef 2  <  oo, 

n  p  oo  n 

E ?2(n)  <  2  V  /  VP(?  >v,x  >k)dv  =  y^E(^2,  r  >  k)=o{n). 
k= o ^  t=o 

The  lemma  is  proved.  □ 

Now  we  continue  the  proof  of  Theorem  13.8.2.  Consider  representation  (13.8.5) 
for  Ao  =  vo  and  assume  again  that  §*  >  0.  Then  £i  =  §Xo, 


—  Cl  T  ^r]{n)  T  £71  C r){n)'> 


where  by  the  Wald  identity 


EZ,^)  =Ey/(w)Ef 


Ef 

~  n  — . 

Er 


Since  7r({vo})  =  1/Er  >  0,  we  have,  by  (13.8.10),  E|§xo|  <  00.  Moreover,  for 

Cx  >0, 


I  ^rj(n)  Zn  I  <  ^rj{n) 


Hence,  by  Lemma  13.8.5, 


E  Sn  =  n 


E£ 

Er 


+  tf(w). 


(13.8.16) 


Combining  this  with  (13.8.12),  we  obtain  the  assertion  of  the  theorem. 

It  remains  to  consider  the  case  where  can  take  values  of  both  signs.  Introduce 
new  random  variables  C*  on  the  chain  A,  defined  by  the  equalities  §*  :=  |§x|,  and 
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endow  with  the  superscript  *  all  already  used  notations  that  will  correspond  to  the 
new  random  variables.  Since  all  >0,  by  condition  (13.8.10)  we  can  apply  to  them 
all  the  above  assertions  and,  in  particular,  obtain  that 

E?  <c  oo,  E — o(ti ).  (13.8.17) 


Since 


I?  I  —  f  ’  \^r}{n)  I  <  %ri(n)’  I I  <  » 

it  follows  from  (13.8.17)  that 

E| 4T I  <  oo,  E|  fan)  -zn  I  =o(w) 

and  relation  (13.8.16)  is  valid  along  with  identity  (13.8.11). 

The  theorem  is  proved.  □ 


Now  we  will  prove  the  strong  law  of  large  numbers. 


Theorem  13.8.3  Let  the  conditions  of  Theorem  13.8.1  he  satisfied.  Then 


n 


a.s. 


E,E£ 


as  n 


oo. 


dS . 

Proof  Since  in  representation  (13.8.5)  one  has  fi /n  — >  0  as  n  — >  oo,  we  can  ne¬ 
glect  this  term  in  (13.8.5). 

The  strong  laws  of  large  numbers  for  {Z/J  and  {7^}  mean  that,  for  a  given  s  >  0, 
the  trajectory  of  {Srk}  will  lie  within  the  boundaries  &E£(1  db  s)  and  ^  7^(1  db  2s) 
for  all  k>n  and  n  large  enough.  (We  leave  a  more  formal  formulation  of  this  to  the 
reader.) 

We  will  prove  the  theorem  if  we  verify  that  the  probability  of  the  event  that,  be¬ 
tween  the  times  Tk,k>n ,  the  trajectory  of  Sj  will  cross  at  least  once  the  boundaries 


rj(  1  ±  3s),  where  r  = 


Ef 

Er’ 


tends  to  zero  as  n  — >  oo.  Since 


T  max  |5y  -SrJ  <?* 

Tk-l<J<Tk 


(13.8.18) 


(in  the  notation  of  the  proof  of  Theorem  13.8.1),  it  is  sufficient  to  verify  that 
P (An)  — >  0  as  n  — >  oo,  where  An  :=  (J T=n^k  >  srTk )•  But 


P  (An)  =  P(AnBn)  +  P(AnBn),  (13.8.19) 


where 

oo 

Bn  =  P\{Tk>mx{\-e)),  P(Bn)  0 

k=n 


as  n 


oo, 
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so  the  second  summand  in  (13.8.19)  tends  to  zero.  The  first  summand  on  the  right- 
hand  side  of  (13.8.19)  does  not  exceed  (for  c  =  £(1  —  e)Ef ) 

(OO  \  00 

UK*  >  eE^(1  -  e4 )  ^  Epfe*  > ck)  -  o 

k=n  /  k=n 


as  n  — >  oo,  since  Ef  *  <  oo  (see  (13.8.17)).  The  theorem  is  proved. 


□ 


13.8.3  The  Central  Limit  Theorem 

As  in  Theorem  13.8.1,  first  we  will  prove  the  main  assertion  under  certain  condi¬ 
tions  on  the  moments  of  f  and  r ,  and  then  we  will  establish  a  connection  of  these 
conditions  to  the  stationary  distribution  of  the  chain  X.  Below  we  retain  the  notation 
of  the  previous  section. 

Theorem  13.8.4  Let  P(r  (v)  <  oo)  =  1  for  any  x,  Er2  <  oo,  the  g.c.d.  of  all  possi¬ 
ble  values  of  x  is  1,  and  E£2  <  oo.  Then,  as  n  oo, 


Sn  -  rn 
d^/n/a 


^  $0,1, 


where  r  :=  a $ / a ,  :=  Ef ,  a  :=  Er  and  J2  :=  D(f  —  rr). 


Proof  We  again  make  use  of  representation  (13.8.5),  where  clearly 


fi 


0 


(see  the  proof  of  Theorem  13.8.1).  This  means  that  the  problem  reduces  to  that  of 
finding  the  limiting  distribution  of  ZV(„)  =  Z^(w)  —  ^(w),  where  by  Lemma  10.6.1 

^rj(n)  has  a  proper  limiting  distribution,  and  so  ^n(n)/V^  0  as  n  — >  oo.  Further¬ 

more,  by  Theorem  10.6.3, 


Zrj(n) 

crsVn 


$0,1, 


where  cr|  :=  a  AD(^ 


The  theorem  is  proved. 


□ 


Now  we  will  establish  relations  between  the  moment  characteristics  used  for 
normalising  Sn  and  the  stationary  distribution  jt .  The  answer  for  the  number  r  was 
given  in  Theorem  13.8.2:  r  =  E^E^.  For  the  number  cr2  we  have  the  following 
result. 
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Theorem  13.8.5  Let 

/oo 

D^x7t(dx)  +  2J2^X0  ~  r)Gxj  ~  r ) 

7  =  1 

be  well-defined  and  finite,  where  Xo  €=jt.  Then 

2  -1  r2  2 

<7S  :=  a  d  =  o  . 

Note  that  here  the  expectation  under  the  sum  sign  is  a  “triple  averaging”:  over 
the  distribution  jr(dy)V(y,  j,  dz )  and  the  distributions  of  and 

Proof  We  have 


_  r) 

£=0 


=  £E(^  -  rf  +  2j2^xk  ~  r)QXj  ~  r),  (13.8.20) 

k= 0  /c<j 


where 


n  n 

Y,mxk  -  o1  =  YJmxk 


k= 0 


k= 0 


E&,)2  +  2>* 

&=0 


(13.8.21) 


The  summands  in  the  first  sum  on  the  right-hand  side  of  (13.8.21)  converge  to  a2  := 

j  D^xjr(dx),  the  summands  in  the  second  sum  converging  to  zero.  Therefore,  the 
left-hand  side  of  (13.8.21)  is  asymptotically  equivalent  to  ncr2. 

Further, 


-  r)Gxj  -r)  =  J2  E  -  r)Gxj  ~  r),  (13.8.22) 

k<j  k= 0  j>k- 1-1 

where  the  distribution  of  converges  in  total  variation  to  the  stationary  distribu¬ 
tion  j r  of  the  chain.  Hence  the  inner  sums  on  the  right-hand  side  of  (13.8.22),  for 
large  k  and  n  —  k  (say,  for  +Jn  <  k  <n  —  y/ n  when  n  — >  oo),  will  be  close  to 

oo 

E:=YiEGx0-rmxJ-r), 

7  =  1 

where  Xq  =  jr  and  the  whole  sum  on  the  right-hand  side  of  (13.8.22)  is  asymptoti¬ 
cally  equivalent,  as  n  — >  oo,  to  (or  will  be  o(n)  if  E  =  0). 

Thus 

— E(SW  —  rn)2  ~  cr?  +  2 E. 
n  ? 


(13.8.23) 
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We  now  show  that  the  existence  of  cr£  and  E  implies  the  finiteness  of  d2  =  E(£  — 
rr)2. 

Consider  the  truncated  random  variables 

U*  if  $xe[-N,Nl 

S{XN)  :=  \  N  if  $X>N, 

[ -N  if  $X<-N. 

Since  a ^  <  oo,  we  have  E§2  <  oo  (a.e.  with  respect  to  the  measure  n)  and 

r(N)  r,  (ct§(A°)2  ->  ff2,  E(N)  £  as  N  oo, 

where  the  superscript  (AO  means  that  the  notation  corresponds  to  the  truncated  ran¬ 
dom  variables.  By  virtue  of  Theorem  13.8.4, 

liminf-  E(S^  —  r^)2  >  a-1(J^)2. 

If  we  assume  that  d  =  oo  then  we  will  get  that  the  lim  inf  on  the  left-hand  side  of  this 
relation  is  infinite.  But  this  contradicts  relation  (13.8.23),  by  which  the  above  liminf 
equals  ( a +  2E(7V)  and  remains  bounded.  We  have  obtained  a  contradiction, 
which  shows  that  d  <  oo. 

On  the  other  hand,  for  d  <  oo,  Ef 2  <  oo  and,  for  the  initial  value  vo,  by  (13.8.5) 
we  have 

E(Sn  rn)2  =  E(Zv(w)  +  -  r^)2 

—  E(Z^(W)  rn)  +  2E(Z^(W)  rn)(zn  ^rj{n))  T  E(zw  £ri(n))  ? 

(13.8.24) 

where  n  =  -  x(n).  Therefore,  putting  :=  Z„  -  rT;  =  ~  rtk),  we 

obtain 

E (Z^n)  ~rn)2  =  EYv(n)  -2EF?7(w)x(n)  +Ex2(^). 

By  virtue  of  (10.4.7),  Ex  2  (ft)  =  o(/?).  By  (10.6.4)  (with  a  somewhat  different  nota¬ 
tion), 

eL2(«)  =TE)7(n), 

where  J2  :=  D(f  —  rr),  E77(ft)  ~  n/ a  and  a  =  Er.  Hence,  applying  the  Cauchy- 
Bunjakovsky  inequality,  we  get 

|E Yr}(n)x(n)\=  °(n),  E(Z^(W)  —  rn)2  ~  nd2a~l .  (13.8.25) 

It  remains  to  estimate  the  last  two  terms  on  the  right-hand  side  of  (13.8.24).  But 


£ rj(n ) 


<  ^(n)’ 
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where  f*  corresponds  to  the  summands  §£  =  \%xk  \  and  where,  by  Lemma  13.8.5 
applied  to  =  \i=x  |,  we  have 

E  (?,*(„))2  =  «(«). 

Therefore  E(^(„)  —  £„)  =  0(71)  and,  by  the  Cauchy-Bunjakovsky  inequality  and 
relation  (13.8.25),  the  same  relation  is  valid  for  the  shifted  moment  in  (13.8.24). 
Thus, 

E(SW  —  rn )2  ~  a~ld2n. 

Combining  this  relation  with  (13.8.23),  we  obtain  the  assertion  of  the  theorem.  □ 


Chapter  14 

Information  and  Entropy 


Abstract  Section  14.1  presents  the  definitions  and  key  properties  of  information 
and  entropy.  Section  14.2  discusses  the  entropy  of  a  (stationary)  finite  Markov  chain. 
The  Law  of  Large  Numbers  is  proved  for  the  amount  of  information  contained  in 
a  message  that  is  a  long  sequence  of  successive  states  of  a  Markov  chain,  and  the 
asymptotic  behaviour  of  the  number  of  the  most  common  states  in  a  sequence  of 
successive  values  of  the  chain  is  established.  Applications  of  this  result  to  coding 
are  discussed. 


14.1  The  Definitions  and  Properties  of  Information  and  Entropy 

Suppose  one  conducts  an  experiment  whose  outcome  is  not  predetermined.  The 
term  “experiment”  will  have  a  broad  meaning.  It  may  be  a  test  of  a  new  device,  a 
satellite  launch,  a  football  match,  a  referendum  and  so  on.  If,  in  a  football  match, 
the  first  team  is  stronger  than  the  second,  then  the  occurrence  of  the  event  A  that  the 
first  team  won  carries  little  significant  information.  On  the  contrary,  the  occurrence 
of  the  complementary  event  A  contains  a  lot  of  information.  The  event  B  that  a 
leading  player  of  the  first  team  was  injured  does  contain  information  concerning  the 
event  A.  But  if  it  was  the  first  team’s  doctor  who  was  injured  then  that  would  hardly 
affect  the  match  outcome,  so  such  an  event  B  carries  no  significant  information 
about  the  event  A. 

The  following  quantitative  measure  of  information  is  conventionally  adopted.  Let 
A  and  B  be  events  from  some  probability  space  (£?,  P). 


Definition  14.1.1  The  amount  of  information  about  the  event  A  contained  in  the 
event  ( message )  B  is  the  quantity 


I  {MB)  :=  log 


P(A|fl) 
P  (A) 


The  notions  of  the  “amount  of  information”  and  “entropy”  were  introduced  by  C.E.  Shannon  in 
1948.  For  some  special  situations  the  notion  of  amount  of  information  had  also  been  considered  in 
earlier  papers  (e.g.,  by  R.V.L.  Hartley,  1928).  The  exposition  in  Sect.  14.2  of  this  chapter  is 
substantially  based  on  the  paper  of  A.Ya.  Khinchin  [21]. 
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The  occurrence  of  the  event  B  =  A  may  be  interpreted  as  the  message  that  A 
took  place. 

Definition  14.1.2  The  number  1(A)  :=  /(A|A)  is  called  the  amount  of  information 
contained  in  the  message  A: 

1(A)  :=  7(A|A)  =  — logP(A). 

We  see  from  this  definition  that  the  larger  the  probability  of  the  event  A,  the 
smaller  I  (A) .  As  a  rule,  the  logarithm  to  the  base  2  is  used  in  the  definition  of  infor¬ 
mation.  Thus,  say,  the  message  that  a  boy  (or  girl)  was  born  in  a  family  carries  a  unit 
of  information  (it  is  supposed  that  these  events  are  equiprobable,  and  —  log2  p  —  1 
for  p  =  1  /2).  Throughout  this  chapter,  we  will  write  just  log  v  for  log2  v. 

If  the  events  A  and  B  are  independent,  then  I(A\B)  =0.  This  means  that  the 
event  B  does  not  carry  any  information  about  A,  and  vice  versa.  It  is  worth  noting 
that  we  always  have 

I(A\B)  =  I(B\A). 

It  is  easy  to  see  that  if  the  events  A  and  B  are  independent,  then 

I(AB)  =  1(A)  +  1(B).  (14.1.1) 

Consider  an  example.  Let  a  chessman  be  placed  at  random  on  one  of  the  squares 
of  a  chessboard.  The  information  that  the  chessman  is  on  square  number  k  (the 
event  A)  is  equal  to  1(A)  =  log  64  =  6.  Let  B\  be  the  event  that  the  chessman  is  in 
the  i- th  row,  and  B 2  that  the  chessman  is  in  the  j-th  column.  The  message  A  can  be 
transmitted  by  transmitting  B\  first  and  then  B2.  We  have 

I  (B\)  =  log  8  =  3  =  /  (£2). 

Therefore 

I(Bl)  +  I(B2)  =  6  =  I(A), 

so  that  transmitting  the  message  A  “by  parts”  requires  communicating  the  same 
amount  of  information  (which  is  equal  to  6)  as  transmitting  A  itself.  One  could 
give  other  examples  showing  that  the  introduced  numerical  characteristics  are  quite 
natural. 

Let  G  be  an  experiment  with  outcomes  E\, . . . ,  Ejq  occurring  with  probabilities 
P 1  ?  •  •  •  ?  P  N  • 

The  information  resulting  from  the  experiment  G  is  a  random  variable 
Jq  =  Jg(u>)  assuming  the  value  —  log pj  on  the  set  Ej,  j  =  1, . . . ,  N. 

Thus,  if  in  the  probability  space  (£2,  P)  corresponding  to  the  experiment  G, 
Q  coincides  with  the  set  (E\, . . . ,  E^),  then  Jg(u>)  =  I  (co). 

Definition  14.1.3  The  expectation  of  the  information  obtained  in  the  experiment  G, 
E  Jg  =  P  j  log  Pj >  is  called  the  entropy  of  the  experiment.  We  shall  denote  it  by 

N 

Hp  =  H(G)  :=  logoi’ 
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Fig.  14.1  The  plot  of  the 
entropy  f(p)  of  a  random 
experiment  with  two 
outcomes 


where  p  =  (p i, . . . ,  p #).  For  pj  =  0,  by  continuity  we  set  pj  log  pj  to  be  equal  to 
zero. 

The  entropy  of  an  experiment  is,  in  a  sense,  a  measure  of  its  uncertainty.  Let, 
for  example,  our  experiment  have  two  outcomes  A  and  B  with  probabilities  p  and 
q  =  l  —  p,  respectively.  The  entropy  of  the  experiment  is  equal  to 

Hp  =  P log  p  (1  -  />)log(l  -  p)  =  f(p). 

The  graph  of  this  function  is  depicted  in  Fig.  14.1. 

The  only  maximum  of  f(p)  equals  log  2=1  and  is  attained  at  the  point  p  =  1/2. 
This  is  the  case  of  maximum  uncertainty.  If  p  decreases,  then  the  uncertainty  also 
decreases  together  with  //p,  and  //p  =  0  for  p  =  (0,  1)  or  (0,  1). 

The  same  properties  can  easily  be  seen  in  the  general  case  as  well. 

The  properties  of  entropy. 

1.  //(G)  =  0  if  and  only  if  there  exists  a  j,  i  <  j  <  N,  such  that  pj  =  P (Ej)  =  1. 

2.  H(G)  attains  its  maximum  when  pj  =  l/N  for  all  j . 


Proof  The  second  derivative  of  the  function  /3(x)  =  xlogx  is  positive  on  [0,  1], 
so  that  P(x)  is  convex.  Therefore,  for  any  qt  >  0  such  that  4i  =  1>  and  any 
Xi  >0,  one  has  the  inequality 

(N  \  N 

L  cn  x> )  - 

i= 1  /  (=1 

If  we  take  qt  =  l/N,  Xi  =  pt ,  then 


Setting  u:=(^,...,^)  we  obtain  from  this  that 


1  N 

-log  —  =  log  N  =  Hu>~22Pi  log  Pi  =  H\ 

i  —  \ 


□ 


Note  that  if  the  entropy  H(G)  equals  its  maximum  value  H(G)  =  \ogN ,  then 
Jq(co)  =  \ogN  with  probability  1,  i.e.  the  information  /g(^)  becomes  constant. 
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3.  Let  G  i  and  G 2  be  two  independent  experiments.  We  write  down  the  outcomes 
and  their  probabilities  in  these  experiments  in  the  following  way: 


G\  = 


E\, . . . ,  E n 
Pi,  •  •  • ,  Pn 


g2  = 


A 1, . . . ,  Am 
qi,  ...  ,qM 


Combining  the  outcomes  of  these  two  experiments  we  obtain  a  new  experiment 


G  =  Gi  x  G2  = 


E\A\,  E\A2, . . . ,  EnAm 

piqi,piqi,  •  ••, PNqM 


The  information  Jg  obtained  as  a  result  of  this  experiment  is  a  random  variable 
taking  values  —  log  ptqj  with  probabilities  ptqj,  i  =  1, . . . ,  N;  j  =  1, . . . ,  M.  But 
the  sum  Jgx  +  Jg2  °f  two  independent  random  variables  equal  to  the  amounts  of 
information  obtained  in  the  experiments  G\  and  G2,  respectively,  clearly  has  the 
same  distribution.  Thus  the  information  obtained  in  a  sequence  of  independent  ex¬ 
periments  is  equal  to  the  sum  of  the  information  from  these  experiments.  Since  in 
that  case  clearly 

E  Jg  =  E7gi  +  E/g2> 

we  have  that  for  independent  G\  and  G2  the  entropy  of  the  experiment  G  is  equal 
to  the  sum  of  the  entropies  of  the  experiments  G\  and  G2  : 

H(G)  =  H(G\)  +  H(G2). 

4.  If  the  experiments  G\  and  G2  are  dependent ,  then  the  experiment  G  can  be 
represented  as 


G  = 


E\A\,  E\A2 , . . . ,  EnAm 

qn,q\2,  •  •  • ,  qNM 


with  qij  =  PiPij ,  where  ptj  is  the  conditional  probability  of  the  event  Aj 
given  Ei ,  so  that 


M 

'Yhqij  =  Pi  =p(E/),  i  =  1, . . . ,  N; 
j= 1 

N 

^  ^  qij  =  qj  =  p (Ai ) ,  j  =  1 , ... ,  m . 
j= 1 

In  this  case  the  equality  Jg  =  Jg\  +  Jg2>  generally  speaking,  does  not  hold.  In¬ 
troduce  a  random  variable  72*  which  is  equal  to  —  log  pij  on  the  set  Ei  A  j .  Then 
evidently  Jg  =  Jgx  +  J% •  Since 

P(A\Ei)  =  pij, 

the  quantity  J f  for  a  fixed  i  can  be  considered  as  the  information  from  the  experi¬ 
ment  G2  given  the  event  Ei  occurred.  We  will  call  the  quantity 

M 

E(J*\E,)  =  -Jjpij\ogplj 

7=1 
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the  conditional  entropy  H{G2\E\)  of  the  experiment  G2  given  E[,  and  the  quantity 

e/2* = -  yy  qi  j  log  Pij = H(G2\E\) 

ij  i 

the  conditional  entropy  //(G2IG1)  of  the  experiment  G2  given  G 1.  In  this  notation, 
we  obviously  have 

H(G)  =  H(Gi)  +  H(G2\Gi). 

We  will  prove  that  in  this  equality  we  always  have 

H(G2\Gi)<H(G2), 

i.e./or  two  experiments  G\  and  G2  the  entropy  H(G)  never  exceeds  the  sum  of  the 
entropies  H(G  1)  and  //(G2): 

H(G)  =  H(Gi  x  G2)  <  H(Gi)  +  H(G2). 

Equality  takes  place  here  only  when  qij  =  piqj,  i.e.  when  G\  and  G2  are  indepen¬ 
dent. 

Proof  First  note  that,  for  any  two  distributions  (u\, . . .  ,un)  and  (iq, . . . ,  vn),  one 
has  the  inequality 

-  logtf,  <  -  y ^  Uj  log  vt ,  (14.1.2) 

i  i 

equality  being  possible  here  only  if  Vi  =  mz-,  i  =  1 , ,n.  This  follows  from  the 
concavity  of  the  function  logv,  since  it  implies  that,  for  any  >0, 

y:  ui  log  at  <  iog(  uiai )  > 

i  v  i  ' 

equality  being  possible  only  if  a\  =  02  —  •  •  •  =  an.  Putting  <zz-  =  uz/mz,  we  obtain 
relation  (14.1.2). 

Next  we  have 

H(G\)  +  H(Gi)  =  -  ^  qij  (log  pt  +  log  qj)  =  -^9,7  log  ptqj, 

ij  Cj 

and  because  {ptqj}  is  obviously  a  distribution,  by  virtue  of  (14.1.2) 

-  yy  Qij  log  Piqj  >  -  y^qij  log  qij  =  H(G ) 
holds,  and  equality  is  possible  here  only  if  qij  =  ptqj .  □ 

5.  As  we  saw  when  considering  property  3,  the  information  obtained  as  a  result  of 
the  experiment  G ”  consisting  of  n  independent  repetitions  of  the  experiment  G  i 
is  equal  to 

N 

^  =  -y>iog^ 
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where  vj  is  the  number  of  occurrences  of  the  outcome  Ej .  By  the  law  of  large 

p 

numbers,  vj/n  — >  pj  as  n  — >►  oo,  and  hence 

ft  1 

To  conclude  this  section,  we  note  that  the  measure  of  the  amount  of  information 
resulting  from  an  experiment  we  considered  here  can  be  derived  as  the  only  possible 
one  (up  to  a  constant  multiplier)  if  one  starts  with  a  few  simple  requirements  that 
are  natural  to  impose  on  such  a  quantity. 

It  is  also  interesting  to  note  the  connections  between  the  above-introduced  no¬ 
tions  and  large  deviation  probabilities.  As  one  can  see  from  Theorems  5.1.2  and 
5.2.4,  the  difference  between  the  “biased”  entropy  —  J2P^nPj  and  the  entropy 

—  J2p*i^nP*i  (P*i  —  vi/n  are  the  relative  frequencies  of  the  outcomes  Ej)  is  an 
analogue  of  the  deviation  function  (see  Sect.  8.8)  in  the  multi-dimensional  case. 


14.2  The  Entropy  of  a  Finite  Markov  Chain.  A  Theorem  on  the 
Asymptotic  Behaviour  of  the  Information  Contained  in  a 
Long  Message;  Its  Applications 

14.2.1  The  Entropy  of  a  Sequence  of  Trials  Forming  a  Stationary 
Markov  Chain 

Let  {XfcjfLi  be  a  stationary  finite  Markov  chain  with  one  class  of  essential  states 
without  subclasses,  E\ , . . . ,  £V  being  its  states.  Stationarity  of  the  chain  means  that 
P(Xi  =  j)  =  it  j  coincide  with  the  stationary  probabilities.  It  is  clear  that 

P(X2  =  j)  =  ^2 11  k Pkj  =  ,  P (X3=  j)  =  7Tj,  and  so  on. 

k 

Let  Gf  be  an  experiment  determining  the  value  of  Xf  (i.e.  the  state  the  system 
entered  on  the  £-th  step).  If  Xk-\  =  i,  then  the  entropy  of  the  £-th  step  equals 

H(Gk\Xk-i  =  i)  =  Pi  j  log  p,j  ■ 

j 

By  definition,  the  entropy  of  a  stationary  Markov  chain  is  equal  to 

H  =  EH(Gk\Xk-i)  =  H(Gk\Gk-i)  =  P‘J  ]°^PU- 

i  j 

Consider  the  first  n  steps  X\ , . . . ,  Xn  of  the  Markov  chain.  By  the  Markov  prop¬ 
erty,  the  entropy  of  this  composite  experiment  G ^  =  G\  x  •  •  •  x  Gn  is  equal  to 


!See,  e.g.,  [11]. 
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H(G(n))  =  H(G\)  +  H(G2\GX)  +  •  •  •  +  H(Gn\Gn-i) 

=  —  227Tj  log77;'  +  (n  —  1  )H  ~  nH 

as  n  — >►  oo.  If  Xk  were  independent  then,  as  we  saw,  we  would  have  exact  equality 
here. 


14.2.2  The  Law  of  Large  Numbers  for  the  Amount  of  Information 
Contained  in  a  Message 


Now  consider  a  finite  sequence  (X\, . . . ,  Xn)  as  a  message  (event)  Cn  and  denote, 
as  before,  by  I(Cn)  =  —  logP(Cw)  the  amount  of  information  contained  in  Cn. 
The  value  of  l(Cn)  is  a  function  on  the  space  of  elementary  outcomes  equal  to 
the  information  JG(n)  contained  in  the  experiment  G^n\  We  now  show  that,  with 
probability  close  to  1,  this  information  behaves  asymptotically  as  nH ,  as  was  the 
case  for  independent  Xf.  Therefore  H  is  essentially  the  average  information  per 
trial  in  the  sequence  {Xk}(^=1. 

Theorem  14.2.1  As  n  — >  oo, 

HPn)  -l0gP(C„)  «...  ^ 

- = - >  H. 

n  n 

This  means  that,  for  any  8  >  0,  the  set  of  all  messages  Cn  can  be  decomposed  into 
two  classes.  For  the  first  class,  |  I(Cn)/n  —  H  \  <  8 ,  and  the  sum  of  the  probabilities 
of  the  elements  of  the  second  class  tends  to  0  as  n  — >►  oo. 


Proof  Construct  from  the  given  Markov  chain  a  new  one  by  setting  Yf  := 

(Xk,  Xk+i).  The  states  of  the  new  chain  are  pairs  of  states  (£),  Ej)  of  the  chain 
{Xk}  with  pij  >  0.  The  transition  probabilities  are  obviously  given  by 


P(iJ)(k,l) 


0,  j  /  k, 

. Pki ,  j=k. 


Note  that  one  can  easily  prove  by  induction  that 


P(i,j)(k,l)(n)  =  pjk(n  -  1  )pu-  (14.2.1) 

From  the  definition  of  {Yk}  it  follows  that  the  ergodic  theorem  holds  for  this  chain. 
This  can  also  be  seen  directly  from  (14.2.1),  the  stationary  probabilities  being 

lim  P(ij)(k,l)(n)=7TkPkh 

/2— >0G 

Now  we  will  need  the  law  of  large  numbers  for  the  number  of  visits  m^kjfn) 
of  the  chain  {Yk)fL]  to  state  ( k ,  /)  over  time  n.  By  virtue  of  this  law  (see  Theo¬ 
rem  13.4.4), 

m(k,l)(n)  a.s. 

- >  UkPki  as  ft  ^  oo. 

n 
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Consider  the  random  variable  P(CW): 


The  product  here  is  taken  over  all  pairs  (k,  /).  Therefore  (7 zy  =  P(Xi  =  /)) 


logP(C„)  =\o%ttXx  1)  log  Pki  > 


□ 


14.2.3  The  Asymptotic  Behaviour  of  the  Number  of  the  Most 
Common  Outcomes  in  a  Sequence  of  Trials 

Theorem  14.2.1  has  an  important  corollary.  Rank  all  the  messages  (words)  Cn  of 
length  n  according  to  the  values  of  their  probabilities  in  descending  order.  Next  pick 
the  most  probable  words  one  by  one  until  the  sum  of  their  probabilities  exceeds  a 
prescribed  level  a,  0  <  a  <  1.  Denote  the  number  (and  also  the  set)  of  the  selected 
words  by  Ma(n). 

Theorem  14.2.2  For  each  0  <  a  <  1 ,  there  exists  one  and  the  same  limit 


Proof  Let  8  >  0  be  a  number,  which  can  be  arbitrarily  small.  We  will  say  that  Cn 


falls  into  category  K\  if  its  probability  P(Cn)  >2  U^H  8\  and  into  category  K2  if 

2-n(H+8)  <  P (cw)  <  2-<h~8\ 


Finally,  Cn  belongs  to  the  third  category  K3  if 

P(C„)  <2~n(H+S). 


Since,  by  Theorem  14.2.1,  P (Cn  e  K\  U  Kf)  — >  0  as  n  — >►  00,  the  set  Ma(n)  contains 
only  the  words  from  K\  and  K2 ,  and  the  last  word  from  Ma(n)  (i.e.  having  the 
smallest  probability) — we  denote  it  by  Ca,n — belongs  to  K2.  This  means  that 


Cn  ClMq(  ( fl ) 


This  implies 


lo gMa(n)  (a +  2  n^H 


n 


n 
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Since  8  is  arbitrary,  we  have 

log  Ma(n) 

lim  sup - <  H. 

n^o o  W 

On  the  other  hand,  the  words  from  K 2  belonging  to  Ma(n)  have  total  probability 

(2) 

>  a  —  P(^Ti).  If  Ma  (n)  is  the  number  of  these  messages  then 

Af<2)(n) 2~n{H-&)  >01-  P(Ki), 

and,  consequently, 

Ma{n)  2~n{H~S)  >a-  P(ATi). 

Since  P(/^i)  — >  0  as  n 


It  follows  that 


The  theorem  is  proved.  □ 

Now  one  can  obtain  a  useful  interpretation  of  this  theorem.  Let  N  be  the  number 
of  the  chain  states.  Suppose  for  simplicity’s  sake  that  N  =  2m .  Then  the  number  of 
different  words  of  length  n  (chains  Cn )  will  be  equal  to  Nn  =  2nm .  Suppose,  further, 
that  these  words  are  transmitted  using  a  binary  code,  so  that  m  binary  symbols 
are  used  to  code  every  state.  Thus,  with  such  transmission  method — we  will  call  it 
direct  coding — the  length  of  the  messages  will  be  equal  to  nm.  (For  example,  one 
can  use  Markov  chains  to  model  the  Russian  language  and  take  N  =  32,  m  =  5.) 
The  assertion  of  Theorem  14.2.2  means  that,  for  large  n ,  with  probability  1  —  e, 
s  >  0,  only  2nH  of  the  totality  of  2nm  words  will  be  transmitted.  The  probability 
of  transmitting  all  the  remaining  words  will  be  small  if  e  is  small.  From  this  it  is 
easy  to  establish  the  existence  of  another  more  economical  code  requiring,  with  a 
large  probability,  a  smaller  number  of  digits  to  transmit  a  word.  Indeed,  one  can 
enumerate  the  selected  2nH  most  likely  words  using,  say,  a  binary  code  again,  and 
then  transmit  only  the  number  of  the  word.  This  clearly  requires  only  nH  digits. 
Since  we  always  have  H  <  log  N  =  m,  the  length  of  the  message  will  be  m  /H  >  1 
times  smaller. 

This  is  a  special  case  of  the  so-called  basic  coding  theorem  for  Markov  chains: 
for  large  n ,  there  exists  a  code  for  which,  with  a  high  probability,  the  original  mes¬ 
sage  Cn  can  be  transmitted  by  a  sequence  of  signals  which  is  m/H  times  shorter 
than  in  the  case  of  the  direct  coding. 

The  above  coding  method  is  rather  an  oversimplified  example  than  a  recipe  for 
efficiently  compressing  the  messages.  It  should  be  noted  that  finding  a  really  ef¬ 
ficient  coding  method  is  a  rather  difficult  task.  For  example,  in  Morse  code  it  is 
reasonable  to  encode  more  frequent  letters  by  shorter  sequences  of  dots  and  dashes. 


00,  for  sufficiently  large  n  one  has 

log  Main)  1  a 

- >  H  -  8  +  -  log 

n  n  2 


log  Main) 

lim  sup - >  H . 

n—>oo  Yl 
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However,  the  text  reduction  by  m/H  times  would  not  be  achieved.  Certain  compres¬ 
sion  techniques  have  been  used  in  this  book  as  well.  For  example,  we  replaced  the 
frequently  encountered  words  “characteristic  function”  by  “ch.f.”  We  could  achieve 
better  results  if,  say,  shorthand  was  used.  The  structure  of  a  code  with  a  high  com¬ 
pression  coefficient  will  certainly  be  very  complicated.  The  theorems  of  the  present 
chapter  give  an  upper  bound  for  the  results  we  can  achieve. 

Since  H  =  \  1  ogN  =  m,  for  a  sequence  of  independent  equiprobable  sym¬ 
bols,  such  a  text  is  incontractible.  This  is  why  the  proximity  of  “new”  messages 
(encoded  using  a  new  alphabet)  to  a  sequence  of  equiprobable  symbols  could  serve 
as  a  criterion  for  constructing  new  codes. 

It  should  be  taken  into  account,  however,  that  the  text  “redundancy”  we  are 
“fighting”  with  is  in  many  cases  a  useful  and  helpful  phenomenon.  Without  such 
redundancy,  it  would  be  impossible  to  detect  misprints  or  reconstruct  omissions  as 
easily  as  we,  say,  restore  the  letter  “r”  in  the  word  “info  •  mation”. 

The  reader  might  know  how  difficult  it  is  to  read  a  highly  abridged  and  formalised 
mathematical  text.  While  working  with  an  ideal  code  no  errors  would  be  admissible 
(even  if  we  could  find  any),  since  it  is  impossible  to  reconstruct  an  omitted  or  dis¬ 
torted  symbol  in  a  sequence  of  equiprobable  digits.  In  this  connection,  there  arises 
one  of  the  basic  problems  of  information  theory:  to  find  a  code  with  the  smallest 
“redundancy”  which  still  allows  one  to  eliminate  the  transmission  noise. 


Chapter  15 

Martingales 


Abstract  The  definitions,  simplest  properties  and  first  examples  of  martingales  and 
sub/super-martingales  are  given  in  Sect.  15.1.  Stopping  (Markov)  times  are  intro¬ 
duced  in  Sect.  15.2,  which  also  contains  Doob’s  theorem  on  random  change  of  time 
and  Wald’s  identity  together  with  a  number  of  its  applications  to  boundary  crossing 
problems  and  elsewhere.  This  is  followed  by  Sect.  15.3  presenting  fundamental  mar¬ 
tingale  inequalities,  including  Doob’s  inequality  with  a  number  of  its  consequences, 
and  an  inequality  for  the  number  of  strip  crossings.  Section  15.4  begins  with  Doob’s 
martingale  convergence  theorem  and  also  presents  Levy’s  theorem  and  an  applica¬ 
tion  to  branching  processes.  Section  15.5  derives  several  important  inequalities  for 
the  moments  of  stochastic  sequences. 


15.1  Definitions,  Simplest  Properties,  and  Examples 

In  Chap.  13  we  considered  sequences  of  dependent  random  variables  Xo,X\, ... 
forming  Markov  chains.  Dependence  was  described  there  in  terms  of  transition 
probabilities  determining  the  distribution  of  Xn+\  given  Xn.  That  enabled  us  to 
investigate  rather  completely  the  properties  of  Markov  chains. 

In  this  chapter  we  consider  another  type  of  sequence  of  dependent  random  vari¬ 
ables.  Now  dependence  will  be  characterised  only  by  the  mean  value  of  Xn+\  given 
the  whole  “history”  Xo, ...,  Xn.  It  turns  out  that  one  can  also  obtain  rather  general 
results  for  such  sequences. 

Let  a  probability  space  (£?,  P)  be  given  together  with  a  sequence  of  random 
variables  Xo,X\, ...  defined  on  it  and  an  increasing  family  (or  flow)  of  a -algebras 
(U>o:5oC5iC.-c5bc...cj 

Definition  15.1.1  A  sequence  of  pairs  {Xn,  $n;  n  >  0}  is  called  a  stochastic  se¬ 
quence  if,  for  each  n  >  0,  Xn  is  ^-measurable.  A  stochastic  sequence  is  said  to 
be  a  martingale  (one  also  says  that  { Xn }  is  a  martingale  with  respect  to  the  flow  of 
a -algebras  {§«})  if,  for  every  n  >  0, 

(1) 


E|X„|  <  oo, 


(15.1.1) 
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(2)  Xn  is  measurable  with  respect  to  , 

(3) 


E(X„  +  1  \dn)  =  X 


n 


(15.1.2) 


A  stochastic  sequence  {Xn,  n  >  0}  is  called  a  submartingale  (, supermartin¬ 
gale )  if  conditions  (1)— (3)  hold  with  the  sign  “=”  replaced  in  (15.1.2)  with  “>” 
(“<”,  respectively). 

We  will  say  that  a  sequence  {Xn}  forms  a  martingale  (, submartingale ,  super¬ 
martingale)  if,  for  $n  =  cr(X o,  . . . ,  Xn),  the  pairs  {Xn,$n}  form  a  sequence  with 
the  same  name.  Submartingales  and  supermartingales  are  often  called  semimartin¬ 
gales. 

It  is  evident  that  relation  (15.1.2)  persists  if  we  replace  Xn+\  on  its  left-hand  side 
with  Xm  for  any  m  >  n.  Indeed,  by  virtue  of  the  properties  of  conditional  expecta¬ 
tions, 


E(X/W|3rw)  — E[E(Xm  |3m-l)  $n\  — E(Xm. 


A  similar  assertion  holds  for  semimartingales. 

If  {Xn}  is  a  martingale,  then  E(Xn+\ |cr(Xo, . . . ,  Xn))  =  Xn,  and,  by  a  property 
of  conditional  expectations, 


E (Xn+i  \cr(X„))  =  E[E(Xn+i  |ff(Xo, . . . ,  Xn))  | o{Xn)\  =  E(Xn  | a{Xn))  =  Xn 


So,  for  martingales,  as  for  Markov  chains,  we  have 


E(X^+i  o (Xq,  . . . ,  Xw))  —  E(X^+i 


o 


The  similarity,  however,  is  limited  to  this  relation,  because  for  a  martingale,  the 
equality  does  not  hold  for  distributions,  but  the  additional  condition 


E(Xn+l\a(Xn))  =  Xn 


is  imposed. 


Example  15.1.1  Let  §w,  n  >  0  be  independent.  Then  Xn  =  £i  +  •••+§„  form  a 
martingale  (submartingale,  supermartingale)  if  E^n  =  0  ( E^n  >  0,  E^n  <  0).  It  is 
obvious  that  Xn  also  form  a  Markov  chain.  The  same  is  true  of  Xn  =nz=o& if 

E$„  =  l. 


Example  15.1.2  Let  rc  >  0,  be  independent.  Then 


n 


Xn  =  YJ^k-^k,  n>  1,  X0  =  §0, 


£=1 


form  a  martingale  if  E§w  =  0,  because 


E(Xw_|_i  a (Xq,  •  •  • ,  Xn)^j  —  Xn  +  E(§w£w+i  o(£n))  —  Xn, 
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Clearly,  {Xn}  is  not  a  Markov  chain  here.  An  example  of  a  sequence  which  is  a 
Markov  chain  but  not  a  martingale  can  be  obtained,  say,  if  we  consider  a  random 
walk  on  a  segment  with  reflection  at  the  endpoints  (see  Example  13.1.1). 

As  well  as  {0,  1, . . .}  we  will  use  other  sets  of  indices  for  Xn,  for  example, 
{—oo  <  n  <  oo}  or  {n  <  —1},  and  also  sets  of  integers  including  infinite  values 
zb oo,  say,  {0  <  n  <  oo}.  We  will  denote  these  sets  by  a  common  symbol  N  and 
write  martingales  (semimartingales)  as  {Xn,$n;  /igW).  By  3_oo  we  will  under¬ 
stand  the  <r -algebra  H/ieN#**  and  by  3oo  the  a -algebra  a  (UweN&i)  generated  by 
U«€N  ,  so  that  {J-oo  c  $n  c  c  $  for  any  n  €  N. 

Definition  15.1.2  A  stochastic  sequence  {Xn,$n\  n  e  N}  is  called  a  martingale 
(, submartingale ,  supermartingale ),  if  the  conditions  of  Definition  15.1.1  hold  for 
any  n  e  N. 

If  {Xn,  «  g  N)  is  a  martingale  and  the  left  boundary  no  of  N  is  finite  (for  ex¬ 
ample,  N  =  {0, 1, . . .}),  then  the  martingale  {Xn,$n}  can  be  always  extended  “to  the 
whole  axis”  by  setting  $n  :=  $no  and  Xn  :=  Xno  for  n  <  no.  The  same  holds  for  the 
right  boundary  as  well.  Therefore  if  a  martingale  (semimartingale)  {Xn,$n;  «gH) 
is  given,  then  without  loss  of  generality  we  can  always  assume  that  one  is  actually 
given  a  martingale  (semimartingale)  {Xn,$n;  —oo<n<oo}. 


Example  15.1.3  Let  {#„,  —  oo  <  n  <  oo}  be  a  given  sequence  of  increasing 
o -algebras,  and  §  a  random  variable  on  (X2,#,  P),  E|£|  <  oo.  Then  {Xn,$n; 
—  oo  <  n  <  oo}  with  Xn  =  E{^\$n)  forms  a  martingale. 

Indeed,  by  the  property  of  conditional  expectations,  for  any  m  <  oo  and  m  >  n. 


E(Xm|ff„)  =  E[E(£|&»)|&,]  =  E($|&.)  =  Xn 


Definition  15.1.3  The  martingale  of  Example  15.1.3  is  called  a  martingale  gener¬ 
ated  by  the  random  variable  §  {and  the  family  {#„}). 


Definition  15.1.4  A  set  is  called  the  right  closure  of'N  if: 

(1)  =  X  when  the  maximal  element  of  N  is  finite; 

(2)  =  3ST  U  {oo}  if  N  is  not  bounded  from  the  right. 

If  K  =  N+  then  we  say  that  3ST  is  right  closed.  A  martingale  {semimartingale) 
{Xn,  n  G  N}  is  said  to  be  right  closed  if  N  is  right  closed. 

Lemma  15.1.1  A  martingale  {Xn,  n  G  H}  is  generated  by  a  random  variable  if 
and  only  if  it  is  right  closed. 


The  Proof  of  the  lemma  is  trivial.  In  one  direction  it  follows  from  Example  15.1.3, 
and  in  the  other  from  the  equality 

E{Xn\ dn)  =  xn,  N  =  sup{k;  k  €  N}, 

which  implies  that  {Xn ,  #}  is  generated  by  Xjq.  The  lemma  is  proved.  □ 
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Now  we  consider  an  interesting  and  more  concrete  example  of  a  martingale  gen¬ 
erated  by  a  random  variable. 


Example  15.1.4  Let  £i,  §2, . . .  be  independent  and  identically  distributed  and  as¬ 
sume  E|£i  |  <  oo.  Set 

Sn  —  £l  T"  ’  ’  ’  T  >  X—n  =  Sn/ Yl,  ft  —n  =  &  (Sn  ?  ^w  +  l ,  •  •  •)  =  CT  (5^  ,  ^h+1  ?•■•)■ 

Then  C  S-^+i  and,  for  any  1  <  k  <  n,  by  symmetry 

E(&|3r-n)  =  E(§i|ff_II). 


From  this  it  follows  that 


n 


Sn=E(Sn\$-n)  =  YtEGk\$-n)=nE($i\$-n^  —  =E(^\d-n) 

~  n 


k=l 


This  means  that  {Xn,$n;  n  <  1}  forms  a  martingale  generated  by  §i 


We  will  now  obtain  a  series  of  auxiliary  assertions  giving  the  simplest  properties 
of  martingales  and  semimartingales.  When  considering  semimartingales,  we  will 
confine  ourselves  to  submartingales  only,  since  the  corresponding  properties  of  su¬ 
permartingales  will  follow  immediately  if  one  considers  the  sequence  Yn  =  —  Xn, 
where  { Xn }  is  a  submartingale. 

Lemma  15.1.2 

(1)  The  property  that  {Xn,  $n;  n  e  N}  is  a  martingale  is  equivalent  to  invariability 
in  m  >n  of  the  set  functions  ( integrals ) 

E(Xm;  A)  =  E(Xn;  A)  (15.1.3) 

for  any  A  e$n.  In  particular ;  EXm  =  const. 

(2)  The  property  that  [Xn ,  $n;  n  eTi]  is  a  submartingale  is  equivalent  to  the  mono¬ 
tone  increase  in  m  >n  of  the  set  functions 

E(Xm;  A)  >  E(Xn;  A)  (15.1.4) 

for  every  A  e$n.  In  particular ;  EX m  f . 


The  Proof  follows  immediately  from  the  definitions.  If  (15.1.3)  holds  then,  by  the 
definition  of  conditional  expectation,  Xn  =  E(Xm\$n),  and  vice  versa.  Now  let 
(15.1.4)  hold.  Put  Yn  =  E(Xm\ $n).  Then  (15.1.4)  implies  that  E (Yn;  A)  >  E(Xn ;  A) 
and  E (Yn  —  Xn ;  A)  >  0  for  any  A  e$n.  From  this  it  follows  that  Yn  =  E(Xm  \$n)  > 
Xn  with  probability  1 .  The  converse  assertion  can  be  obtained  as  easily  as  the  direct 
one.  The  lemma  is  proved.  □ 

Lemma  15.1.3  Let  {Xn,$n;  n  e  N}  be  a  martingale ,  g(x)  be  a  convex  function, 
and  E\g(Xn)\  <  oo.  Then  { g(Xn),  $n;  weW)  is  a  submartingale. 

If  in  addition ,  g(x)  is  nondecreasing ,  then  the  assertion  of  the  theorem  remains 
true  when  {Xn,  $n;  n  G  N}  is  a  submartingale. 
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The  Proof  of  both  assertions  follows  immediately  from  Jensen’s  inequality 

E(g(Xn+i)|&,)  >g(E(Xn+1|^))  >s(E(X„|ff„)).  □ 


Clearly,  the  function  g(v)  =  \x\p  for  p  >  1  satisfies  the  conditions  of  the  first 
part  of  the  lemma,  and  the  function  g(v)  =  eXx  for  X  >  0  meets  the  conditions  of 
the  second  part  of  the  lemma. 

Lemma  15.1.4  Let  {Xn,  $n;  n  e  N}  be  a  right  closed  submartingale.  Then,  for 
Xn(a)  =  max{Aw,  a}  and  any  a ,  { Xn(a),  $n‘,  n  e'N}  is  a  uniformly  integrable  sub¬ 
martingale. 

If  {Xn,  $n;  n  e  N}  is  a  right  closed  martingale ,  then  it  is  uniformly  integrable. 

Proof  Let  N  :=  sup{&  :  k  e  N}.  Then,  by  Lemma  15.1.3,  {Xn(a),$n;  n  e  N}  is 
a  submartingale.  Hence,  for  any  c  >  0, 

cP(Xn(a)  >  c)  <  E (Xn(a);  Xn(a)  >  c)  <  E (XN(a)\  Xn(a)  >  c)  <  E X+(a) 

(here  X+  =  max(0,  X))  and  so 

P(X„(a)>c)<lE(X+(a))^0, 

uniformly  in  n  as  c  — >  oo.  Therefore  we  get  the  required  uniform  integrability: 
supE (Xn(a);  Xn(a)  >  c)  <  supE(Ayv(a);  Xn(a)  >  c)  ->  0, 

n  n 

since  sup/;  P (Xn(a)  >  c)  ->  0  as  c  — >  oo  (see  Lemma  A3. 2. 3  in  Appendix  3;  by 
truncating  at  the  level  a  we  avoided  estimating  the  “negative  tails”). 

If  { Xn ,$n;  n  e  N}  is  a  martingale,  then  its  uniform  integrability  will  follow  from 
the  first  assertion  of  the  lemma  applied  to  the  submartingale  {\Xn\,$n;  n  e  N}. 
The  lemma  is  proved.  □ 

The  nature  of  martingales  can  be  clarified  to  some  extent  by  the  following  exam¬ 
ple. 


Example  15.1.5  Let  §i,§2>---  be  an  arbitrary  sequence  of  random  variables, 
E|£*|  <00 ,$n=  o(£  1, . . . ,  £w)  for  n  >  1,  =  (0,  &)  (the  trivial  a-algebra), 


n 


k=  1 


n 

Zn  =  T>(M-i),  Xn  =  Sn  —  Zn. 
k=  1 


Then  [Xn,  $n;  n  >  1}  is  a  martingale.  This  is  a  consequence  of  the  fact  that 


E(Sw+i  -  Zn+i\$n)  =  E(Xn  1  —  E^h+iI&OI&j)  =  X„. 


In  other  words,  for  an  arbitrary  sequence  {§„},  the  sequence  can  be  “com¬ 
pensated”  by  a  so-called  “predictable”  (in  the  sense  that  its  value  is  determined  by 
... ,  Sn- 1)  sequence  Zn  so  that  Sn  —  Zn  will  be  a  martingale. 
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15.2  The  Martingale  Property  and  Random  Change  of  Time. 
Wald’s  Identity 

Throughout  this  section  we  assume  that  N  =  {n  >  0}.  Recall  the  definition  of  a  stop¬ 
ping  time. 

Definition  15.2.1  A  random  variable  v  will  be  called  a  stopping  time  or  a  Markov 
time  (with  respect  to  an  increasing  family  of  o -algebras  {$n\  n  >  0})  if,  for  any 
n  >  0,  {v  <  n}  G  $n. 

It  is  obvious  that  a  constant  v  =  m  is  a  stopping  time.  If  v  is  a  stopping  time, 
then,  for  any  fixed  m,  v(m)  =  min(v,  m),  is  also  a  stopping  time,  since  for  n  >  m 
we  have 


v(m)  <m<n,  { v(m)  <  n]  =  Q  G 


and  if  n  <  m  then 


{ v(m)  <  ft}  =  {v  <  n]  G  $n. 

If  v  is  a  stopping  time,  then 

{v  =  n]  =  {v  <  n}  -  {v  <  n  -  1}  e  $n,  {v  >  n}  =  Q  -  {v  <  n  -  1}  e  $n- 1- 

Conversely,  if  {v  =  n}  e$n,  then  {v  <  n]  e  $n  and  therefore  v  is  a  stopping  time. 

Let  a  martingale  {Xn ,$n;  n  >  0}  be  given.  A  typical  example  of  a  stopping  time 
is  the  time  v  at  which  Xn  first  hits  a  given  measurable  set  B : 

v  =  inf{n  >  0  :  Xn  e  B} 

(v  =  oo  if  all  Xn  £  B).  Indeed, 

{v  =  n}  =  {Xo  ^  B, . . . ,  Xn-\  £  B,  Xn  G  B}  G  $n. 

If  v  is  a  proper  stopping  time  (P(v  <  oo)  =  1),  then  Xv  is  a  random  variable, 
since 

oo 

Xv  =  ^  ]  Xnl{v=n}  • 

n  =0 

By  we  will  denote  the  cr -algebra  of  sets  Ag  J  such  that  A  Pi  {v  =  n}  G 
«  =  0, 1, . . .  This  a -algebra  can  be  thought  of  as  being  generated  by  the  events 
{v  <  n]  fl  Bn,  n  =  0,  1, . . .,  where  Bn  g  $n.  Clearly,  v  and  Xv  are  $v -measurable. 
If  vi  and  V2  are  two  stopping  times,  then  {v2  >  vi }  g  $Vl  and  {v2  >  vi }  g  $V2,  since 
{v2  >  vi }  =  UJ{V2  =  n}  H  [vi  <  n}]. 

We  already  know  that  if  {Xn,  $n]  is  a  martingale  then  EXn  is  constant  for  all  n. 
Will  this  property  remain  valid  for  EXV  if  v  is  a  stopping  time?  From  Wald’s  identity 
we  know  that  this  is  the  case  for  the  martingale  from  Example  15.1.1.  In  the  general 
case  one  has  the  following. 
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Theorem  15.2.1  (Doob)  Let  {Xn,  $n\  n  >  0}  be  a  martingale  (, submartingale )  and 
v\ ,  x>2  be  stopping  times  such  that 

E\XVi  |  <  oo,  i  =  1,2,  (15.2.1) 

\iminfE(\Xn\;  V2>n)  =0.  (15.2.2) 

n^oo  v  7 

Then,  on  the  set  {v2  >  vi}, 

E(XV2\SVl)  =  XVl  (>XV1).  (15.2.3) 

This  theorem  extends  the  martingale  (submartingale)  property  to  random  time. 

Corollary  15.2.1  If  x>2  =  v  >0  is  an  arbitrary  stopping  time,  then  putting  v\  =  n 
(also  a  stopping  time )  we  have  that,  on  the  set  v  >n, 

E(XV\  dn)  =  Xn,  EXV=EX0 , 

or,  which  is  the  same,  for  any  A  e  $n  Cl  {v  >  n], 

E(XV‘  A)=E(Xn‘  A). 

For  submartingales  substitute  “=”  by  “>”. 

Proof  of  Theorem  15.2.1  To  prove  (15.2.3)  it  suffices  to  show  that,  for  any  A  e  SrVl , 

E(XV2;  A  n  [V2  >  vi })  =  E(XVl;  AT \{v2>  vi}).  (15.2.4) 

Since  the  random  variables  vz-  are  discrete,  we  just  have  to  establish  (15.2.4)  for  sets 
An  =  A  n  {v\  =  n)  e  $n,  n  =  0, 1 , . . . ,  i.e.  to  establish  the  equality 

E(XV2;  An  H{v2  >n })  =E(Xn;  An  D  {v2  >n}).  (15.2.5) 

Thus  the  proof  is  reduced  to  the  case  v\  =n.  We  have 

E(xn\  An  n  {V2  >  n})  =  E{xn\  An  H  {v2  =  n })  +  E(Xn\  An  n  {v2  >  n  +  1}) 

=  E(XV2;  An  H  {v2  =  n})  +  E(Xn+\‘,  An  n  {v2  >  n  +  1}). 

Here  we  used  the  fact  that  {v2  >n i}  e$n  an(i  the  martingale  property  (15.1.3). 
Applying  this  equality  m  —  n  times  we  obtain  that 

E(AV2;  An  H  {n  <  v2  <  m}) 

=  E{xn\  An  n  {v2  >  n})  -  E(xm\  An  H  {v2  >  m}).  (15.2.6) 

By  (15.2.2)  the  last  expression  converges  to  zero  for  some  sequence  m  — >  oo. 
Since 

An,m  :=  An  n  {n  <  V2  <  m)  f  Bn  =  An  H  {n  <  V2}, 
by  the  property  of  integrals  and  by  virtue  of  (15.2.6), 

E (^Xy2 5  An  n{n<  v2})  =  lim  E(XV2  m,  An,m)  —  E (Xn \  An  Cl  {v2  ^  ^}) • 
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Thus  we  proved  equality  (15.2.5)  and  hence  Theorem  15.2.1  for  martingales.  The 
proof  for  submartingales  can  be  obtained  by  simply  changing  the  equality  signs  in 
certain  places  to  inequalities.  The  theorem  is  proved.  □ 

The  conditions  of  Theorem  15.2.1  are  far  from  always  being  met,  even  in  rather 
simple  cases.  Consider,  for  instance,  a  fair  game  (see  Examples  4.2.3  and  4.4.5) 
versus  an  infinitely  rich  adversary,  in  which  z  +  Sn  is  the  fortune  of  the  first  gam¬ 
bler  after  n  plays  (given  he  has  not  been  ruined  yet).  Here  z  >  o,  s„  =  J2l= iHk, 
P (£fc  =  ±1)  =  1/2,  rj(z)  =  min{k  :  5/  =  —  z]  is  obviously  a  Markov  (stopping) 
time,  and  the  sequence  {S„;  n  >  0},  So  =  0,  is  a  martingale,  but  S^z)  —  — z.  Hence 
E Sri(z)  =  =  0,  and  equality  (15.2.5)  does  not  hold  for  v\  =  0,  V2  =  rj(z), 

z  >  0,  n  >  0.  In  this  example,  this  means  that  condition  (15.2.2)  is  not  satisfied  (this 
is  related  to  the  fact  that  E rj(z)  =  oo). 

Conditions  (15.2.1)  and  (15.2.2)  of  Theorem  15.2.1  can,  generally  speaking,  be 
rather  hard  to  verify.  Therefore  the  following  statements  are  useful  in  applications. 

Put  for  brevity 

n 

Hn  ■—  Xn  Xn—\ ,  §0  -=  Yn  •—  ^  I  >  n  =  0,  1 , . . . 

k= 0 


Lemma  15.2.1  The  condition 

E Yv  <  oo  (15.2.7) 

is  sufficient  for  (15.2.1)  and  (15.2.2)  (with  V/  =  v). 

The  Proof  is  almost  evident  since  |XV|  <  Yv  and 

E(\Xn\;  v  >  n)  <  E(TV;  v  >  n). 

Because  P(v>n)^0  and  ETV  <  oo,  it  remains  to  use  the  property  of  integrals  by 
which  E(ij;  An)  — >  0  if  E 1 77 1  <00  and  P (An)  — ^  0.  □ 


We  introduce  the  following  notation: 

an  :=E(|£„|  er2  :=E(£2|£„_i), 

where  #_i  can  be  taken  to  be  the  trivial  a -algebra. 


n  =  0,1,  2, 


Theorem  15.2.2  Let  {Xn\  n  >0}  he  a  martingale  (submartingale)  and  v  be  a  stop¬ 
ping  time  (with  respect  to  =  <j(Xq , . . . ,  Xw)}). 

(1)  If 


Ev  <  00 

(15.2.8) 

and,  for  all  n 

>  0,  on  the  set  {v  >  n)  e  $n-\  one  has 

On  Sc  =  const, 

(15.2.9) 

then 

E|XV|  <  00,  EXV  =  EX0  (>EX0). 

(15.2.10) 
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(2)  If \  in  addition ,  E =  E£^  <  oo  then 

V 

EXl  =  E^2ak-  (15.2.11) 

k=  1 


Proof  By  virtue  of  Theorem  15.2.1,  Corollary  15.2.1  and  Lemma  15.2.1,  to  prove 
(15.2.10)  it  suffices  to  verify  that  conditions  (15.2.8)  and  (15.2.9)  imply  (15.2.7). 
Quite  similarly  to  the  proof  of  Theorem  4.4.1,  we  have 

oo  In  \  oo  oo  oo 

E\YV\  =  £E(|&|;  v  =  n)\=Yi  £e(|&|;  v=n)  =  ^E(|&|;  v  >  k). 

/?  =()  \  k=0  J  k=0  n=k  k= 0 

Here  {v  >  k]  =  Q  \  {v  <  k  —  1}  e  dk-i  •  Therefore,  by  condition  (15.2.9), 

E(|&|;  v>*)  =  E(E(|&|  l^fc-i);  v>k)  <  cP(v  >  k). 

This  means  that 

oo 

ETV  <  c  P(v  >  k)  =  c Ev  <  oo. 

k= 0 


Now  we  will  prove  (15.2.1 1).  Set  Zn  :=  X„  —  Yo  One  can  easily  see  that  Zn 
is  a  martingale,  since 

E(X„2+i  -  X2n  -  a2+1 |&,)  =  E(2Xn$n+l  +  fn+l  -  a2+1  |&,)  =  0. 

It  is  also  clear  that  E|ZW  |  <  oo  and  v(n)  =  min(v,  n)  is  a  stopping  time.  By  virtue  of 
Lemma  15.2.1,  conditions  (15.2.1)  and  (15.2.2)  always  hold  for  the  pair  {Z/J,  v(n). 
Therefore,  by  the  first  part  of  the  theorem, 


EZy(n)  —  0,  JLXV^  — 


It  remains  to  verify  that 


lim  EX,Imn 

n  — >  oo  v{n) 


EX l 


v(n) 

k= 1 


v{n) 


v 


lim  Ej^^k  = 


<7i 


n^oo 


k=  1 


k=  1 


(15.2.12) 


(15.2.13) 


The  second  equality  follows  from  the  monotone  convergence  theorem  ( v(n )  \  v, 

A  A  ^  A 

ak^  0).  That  theorem  implies  the  former  equality  as  well,  for  X~^n)  — >  XAV  and 

Xy(/7 )t-  To  verify  the  latter  claim,  note  that  {X„,$n;  n  >  0}  is  a  martingale,  and 
therefore,  for  any 


E(X2v{n)\  A)  =  E(xl ;  A  n  {y  <  «})  +  E(X^;  Afl{v>  «}) 

<  e(x2\  a  n  {v  <  «})  +  e(e(x^+1  |&,);  a  n  {v  >  «}) 

=  E(X^;  An{v<n  +  1})+  E(X^_|_j ;  Afl{v>w+1}) 
=  E(^r2(n+l);  ^)- 
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Thus  (15.2.12)  and  (15.2.13)  imply  (15.2.11),  and  the  theorem  is  completely 
proved.  □ 

The  main  assertion  of  Theorem  15.2.2  for  martingales  (submartingales): 

EXV  =  EXq  (>  EX0)  (15.2.14) 

was  obtained  as  a  consequence  of  Theorem  15.2.1.  However,  we  could  get  it  directly 
from  some  rather  transparent  relations  which,  moreover,  enable  one  to  extend  it  to 
improper  stopping  times  v. 

A  stopping  time  v  is  called  improper  if  0  <  P(v  <  oo)  =  1  —  P(v  =  oo)  <  1 . 
To  give  an  example  of  an  improper  stopping  time,  consider  independent  identically 
distributed  random  variables  §&,  a  =  E^k  <  0,  Xn  =  J2k= l  and  Put 

v  =  r](x)  :=  min{^  >  1  :  X^  >  x},  x  >  0. 

Here  v  is  finite  only  for  such  trajectories  { X that  sup^  X \  >  v.  If  the  last  inequality 
does  not  hold,  we  put  v  =  oo.  Clearly, 

P(v  =  oo)  =  P^sup  Xk  <  >  0. 

k 

Thus,  for  an  arbitrary  (possibly  improper)  stopping  time,  we  have 

oo  oo 

E(XV;  v  <  oo)  =  ^E(X^;  v  =  k)  =  ^[E(X^;  v  >  k)  -  E(A^;  v  >  k  +  1)]. 

k= o  k= o 

(15.2.15) 

Assume  now  that  changing  the  order  of  summation  is  justified  here.  Then,  by  virtue 
of  the  relation  {v>^+l}G^,we  get 

oo 

E(2fy;  v  <  oo)  =  EXo  +  ^  ]E(X^_|_i  —  X^\  v  >  k  +  1) 

k= 0 
oo 

=  EXo  +  J]EI(v  >/c  +  l)E(Xit+i  -  Xk\$k)-  (15.2.16) 

k= o 

Since  for  martingales  (submartingales)  the  factors  E(Xk+i  —  Xk\dk)  =  0  (>  0),  we 
obtain  the  following. 

Theorem  15.2.3  If  the  change  of  the  order  of  summation  in  (15.2.15)  and  (15.2.16) 
is  legitimate  then,  for  martingales  (, submartingales ), 

E(XV;  v  <  oo)  =  EA0  (>  EX0).  (15.2.17) 

Assumptions  (15.2.8)  and  (15.2.9)  of  Theorem  15.2.2  are  nothing  else  but  con¬ 
ditions  ensuring  the  absolute  convergence  of  the  series  in  (15.2.15)  (see  the  proof  of 
Theorem  15.2.2)  and  (15.2.16),  because  the  sum  of  the  absolute  values  of  the  terms 
in  (15.2.16)  is  dominated  by 

oo 

J>P(v  >  k  +  1)  <  aEv  <  oo, 
k= l 
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where,  as  before,  a^  =  E(|^|  |  dk-i)  with  ^  —  Xk~\.  This  justifies  the  change 

of  the  order  of  summation. 

There  is  still  another  way  of  proving  (15.2.17)  based  on  (15.2.15)  specifying  a 
simple  condition  ensuring  the  required  justification.  First  note  that  identity  (15.2.17) 
assumes  that  the  expectation  E(XV;  v  <  oo)  exists,  i.e.  both  values  E(X±;  v  <  oo) 
are  finite,  where  x±  =  max(±v,  0). 

Theorem  15.2.4  1.  Let  {Xn,  $n]  be  a  martingale.  Then  the  condition 

lim  E(Xn;  v  >  n)  =  0  (15.2.18) 

n^oo 

is  necessary  and  sufficient  for  the  relation 

lim  E(Xn\  v  <  n)  =  EXq.  (15.2.19) 

n^oo 

A  necessary  and  sufficient  condition  for  (15.2.17)  is  that  (15.2.18)  holds  and  at 
least  one  of  the  values  E(X±;  v  <  oo)  is  finite. 

2.  If  {Xn,  $n)  is  a  supermartingale  and 

liminfE(2fw;  v  >  n)  >  0,  (15.2.20) 

n^o o 

then 


limsupE(Aw;  v  <  n)  <  EAo. 

n^oo 

If  in  addition ,  at  least  one  of  the  values  E(X^;  v  <  oo)  is  finite  then 

E(XV;  v  <  oo)  <  EXo. 

3.  If  in  conditions  (15.2.18)  and  (15.2.20),  we  replace  the  quantity  E(Xn;  v  >  h) 
with  E(Xn;  v  >  n ),  the  first  two  assertions  of  the  theorem  will  remain  true. 

The  corresponding  symmetric  assertions  hold  for  submartingales. 


Proof  As  we  have  already  mentioned,  for  martingales,  E(§^;  v  >  k)  =  0.  Therefore, 
by  virtue  of  (15.2.18) 


EX()  =  lim 

n^oo 


EAq  +  'y  ]E(^;  v  >  k)  —  E(Xn,  v  >  n  +  1) 


k=  l 


Here 


n  n  n 

y^E(g^;  v  >  k)  =  ^2E(Xk;  v  >  k)  -  ^E(X*_i;  v  >  k) 


k= l 


k= l 


k=  l 


n  n  — 1 

=  ^E(XJt;  v  >k)  -  ^E(.X*;  v>fc+l). 
A:=  1  k=  1 


Hence 
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n 


EX0  =  lim  VTEtXfc;  v>k)~  E(X*;  v  >  k  +  1)1 

n  — >  oo  L  J 


k= 0 
n 


—  lim  y^E(X^;  v  =  k)  =  lim  E(XV ;  v  <n) 

n — >oo  '  ^ 


/:=0 


n^oo 


These  equalities  also  imply  the  necessity  of  condition  (15.2.18). 

If  at  least  one  of  the  values  E(X^;  v  <  oo)  is  finite,  then  by  the  monotone  con¬ 
vergence  theorem 


lim  E(Xn;  v  <  n)  =  lim  E(X j” ;  v  <  n)  —  lim  E(X  ;  v  <n) 

n^o o  n^oo  v  7  n^oo  v  7 

=  E(X+ ;  v  <  oo)  —  E(X“;  v  <  oo)  =  E(XV;  v  <  oo). 

The  third  assertion  of  the  theorem  follows  from  the  fact  that  the  stopping  time 
v(n)  =  min(v,  n)  satisfies  the  conditions  of  the  first  part  of  the  theorem  (or  those  of 
Theorems  15.2.1  and  15.2.3),  and  therefore,  for  the  martingale  { Xn }, 

EXo  =  E Xv(n)  =  E(XV;  v  <  n)  +  E(XV‘  v  >  n ), 

so  that  (15.2.19)  implies  the  convergence  E{Xn\  v  >  n)  — >  0  and  vice  versa. 

The  proof  for  semimartingales  is  similar.  The  theorem  is  proved.  □ 


That  assertions  (15.2.17)  and  (15.2.19)  are,  generally  speaking,  not  equivalent 
even  when  (15.2.18)  holds  (i.e.,  lim^oo  E(XV;  v  <  n)  =  E(XV;  v  <  oo)  is  not  al¬ 
ways  the  case),  can  be  illustrated  by  the  following  example.  Let  ^  be  independent 
random  variables  with 

P(&  =  3*)=P(&  =  -3*)  =  l/2, 

v  be  independent  of  {§t},  and  P(v  =  k)  =  2  ,  k  =  1,2,....  Then  Xo  =  0,  X*  = 

X*_i  +  &  for  k  >  1  is  a  martingale, 

EXn  =  0,  P(v  <  oo)  =  1,  E(Xn;  v  >  n)  =  EXnP(v  >  n)  =  0 


by  independence,  and  condition  (15.2.18)  is  satisfied.  By  virtue  of  (15.2.19),  this 
means  that  lim^oo  P(XV;  v  <  n)  =  0  (one  can  also  verify  this  directly).  On  the 
other  hand,  the  expectation  E(XV;  v  <  oo)  =  EXV  is  not  defined,  since  EX+  = 
EX~  =  oo.  Indeed,  clearly 


Xk-i  >  - 


3k  —  3 


{&  =  3*}c 


Xk> 


3k  +  3 


P  Xk  > 


3k  +  3 


1 

“  2’ 


+  3k  -\- 3 


oo 


oo 


EXT  > - 

k  —  4 


Exv  =  E  2~kEXt  >  E : 2' 


-k-2nk 


3  =  00. 


fc=l 


fc=l 


By  symmetry,  we  also  have  EXV  =  oo. 


Corollary  15.2.2  1.  If  {Xn,  f$n}  is  a  nonnegative  martingale ,  then  condition 
(15.2.18)  is  necessary  and  sufficient  for  (15.2.17). 
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2.  If  {Xn,  $n]  is  a  nonnegative  supermartingale  and  v  is  an  arbitrary  stopping 
time ,  then 

E(XV ;  v  <  oo)  <  EX0.  (15.2.21) 

Proof  The  assertion  follows  in  an  obvious  way  from  Theorem  15.2.4  since  one  has 

E(X“;  v  <  oo)  =0.  □ 

Theorem  15.2.2  implies  the  already  known  Wald’s  identity  (see  Theorem  4.4.3) 
supplemented  with  another  useful  statement. 

Theorem  15.2.5  (Wald’s  identity)  Let  fi,  •  •  •  be  independent  identically  dis¬ 
tributed  random  variables ,  Sn  =  0  +  •••  +  £«,  5b  =  0,  and  assume  E£i  =  a.  Let , 
further ;  v  be  a  stopping  time  with  Ev  <  oo.  77z<?n 

E5V  =  aEv.  (15.2.22) 

If  moreover ;  tr2  =  Varf^  <  oo,  f/ien 

E[5„  -  va]2  =  ct2Ev.  (15.2.23) 

Proof  It  is  clear  that  Xw  =  5n  —  na  forms  a  martingale  and  conditions  (15.2.8)  and 
(15.2.9)  are  met.  Therefore  EXV  =  EXo  =  0,  which  is  equivalent  to  (15.2.22),  and 
EX'l  =  Eva2,  which  is  equivalent  to  (15.2.23).  □ 

Example  15.2.1  Consider  a  generalised  renewal  process  (see  Sect.  10.6)  5(f)  = 
Srj(t),  where  Sn  =  YTj=i%j  (in  this  example  we  follow  the  notation  of  Chap.  10 
and  change  the  meaning  of  the  notation  Sn  from  the  above),  77(f)  =  min{k  :  7&  >  f}, 
Tn  =  YTj= l  Tj  and  (zji  Hj)  are  independent  vectors  distributed  as  (r,  §),  r  >0.  Set 

a^  =  E§,  a  =  Er,  cr 2  =  Var§  and  a2  =  Varr.  As  we  know  from  Wald’s  identity  in 
Sect.  4.4, 

t  +  Ex(t) 

E  77(f)  = - — ,  ES(t)  =  atErjf), 

a 

where  Exit)  =  o(t)  as  t  — >  00  (see  Theorem  10.1.1)  and,  in  the  non-lattice  case, 

Exit)  — >  °'^2  if  o'2  <  00  (see  Theorem  10.4.3). 

We  now  find  Var  77(f)  and  Var  5(f).  Omitting  for  brevity’s  sake  the  argument  f, 
we  can  write 

a2  Var  77(f)  =  a2  Var  77  =  E(ar]  —  aEr/)2  =  E(ar]  —  7^  +  7^  —  (3E77)2 

=  E(7b  —  aTy)2  +  E(7^  —  aE77)2  —  2E(7^  —  <277 )(Trj  —  aEij). 

The  first  summand  on  the  right-hand  side  is  equal  to 

2  <j2^ 

<t2E77  = - b  0(1) 

a 

by  Theorem  15.2.3.  The  second  summand  equals,  by  (10.4.8)  (/(O  =  Trj(t)  ~ 

E (t  +  x(0  -  dE nf  =  e(x(0  -  EX(0)2  <  Ex2(0  =  o{t). 
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The  last  summand,  by  the  Cauchy-Bunjakovsky  inequality,  is  also  o(t).  Finally,  we 
get 

o2t 

Var  r](t)  =  — r  +  o(t). 
a J 

Consider  now  (with  r  =a^/a\  —  rrj,  E =  0) 

Var  S(t)  =  E  (5/  —  a^Eq)2  =  E[^  —  rTv  +  r(7^  —  aE^)]2 

(v  \2  /  v  \ 

XX  ij  +  r2E(Tn  -«E??)2  +2rE^y]^j(r,  -aE?j). 

The  first  term  on  the  right-hand  side  is  equal  to 

t  Varf 

E77  Var  £  = - -  +  0(1) 

a 

by  Theorem  15.2.3.  The  second  term  has  already  been  estimated  above.  Therefore, 
as  before,  the  sum  of  the  last  two  terms  is  o(t).  Thus 

Var  S(t)  =  -  E(§  —  rr)2  +  o(t). 
a 

This  corresponds  to  the  scaling  used  in  Theorem  10.6.2. 


Example  15.2.2  Examples  4.4.4  and  4.5.5  referring  to  the  fair  game  situation  with 
P (£k  =  ±1)  =  1/2  and  v  =  min{k  :  S*  =  zi  or  Sk  =  -z\)  (z\  and  Z2  being  the 
capitals  of  the  gamblers)  can  also  illustrate  the  use  of  Theorem  15.2.5. 

Now  consider  the  case  p  =  P (£&  =  1)  ^  1/2.  The  sequence  Xn  =  ( q/p)Sn , 
n  >0,  q  =  l  —  p  is  a  martingale,  since 

E  (qlp)Kk  =  p(q/p )  +  q(p/q)  =  1. 

By  Theorem  15.2.5  (the  probabilities  Pi  and  P2  were  defined  in  Example  4.4.5), 

EXv=EX0=l,  Pl(q/p)Z2  +  P2(q/p)zl  =  1. 


From  this  relation  and  equality  Pi  +  P2  =  1  we  have 


p  =  (q/p)zi  -  i 
1  ( q/p)zi  -  (q/p)Z2  ’ 

Using  Wald’s  identity  again,  we  also  obtain  that 


P2  =  l-Pt. 


ESy  P\z2  -  P2zi 

Ev  = - = - . 

Efi  p~q 

Note  that  these  equalities  could  have  been  obtained  by  elementary  methods  but  this 
would  require  lengthy  calculations. 


!See,  e.g.,  [12]. 
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In  the  cases  when  the  nature  of  Sv  is  simple  enough,  the  assertions  of  the  type 
of  Theorems  15.2.1-15.2.2  enable  one  to  obtain  (or  estimate)  the  distribution  of  the 
random  variable  v  itself.  In  such  situations,  the  following  assertion  is  rather  helpful. 

Suppose  that  the  conditions  of  Theorem  15.2.5  are  met,  but,  instead  of  conditions 
on  the  moments  of  fw,  the  Cramer  condition  (cf.  Chap.  9)  is  assumed  to  be  satisfied: 

f(k)  :=  EeXi>  <  oo 


for  some  k  ^  0. 

In  other  words,  if 

A+  :=  sup(A  :  i/s(k)  <  oo)  >  0,  A_  :=  inf(A  :  \f(k)  <  oo)  <  0, 
then  A -|_  —  A_  >  0.  Everywhere  in  what  follows  we  will  only  consider  the  values 

k  e  B  :=  {^r( k )  <  oo}  c  [A_,  A+] 
for  which  \f'(k)  <  oo.  For  such  A,  the  positive  martingale 

e‘ 


kst i 


Xn  = 


\/fn(k )’ 


X0  =  l, 


is  well-defined  so  that  EXn  =  1 . 


Theorem  15.2.6  Let  v  be  an  arbitrary  stopping  time  and  k  e  B.  Then 


/ 

El  - ;  v  <  oo  I  <  1 

and,  for  any  s  >  1  and  r  >  1  such  that  l/r  +  l/s  =  1, 

E(eXSv;  v  <  oo)  <  {E [frrv^s (ks);  v  <  oo]}1//r. 

A  necessary  and  sufficient  condition  for 

ASv 


El 


if{X) 


v 


;  v  <  oo  =  1 


is  that 


lim  E, 

n^o o  V  \f(k)n 


;  V  >  n  1=0. 


(15.2.24) 


(15.2.25) 


(15.2.26) 


(15.2.27) 


Remark  15.2.1  Relation  (15.2.26)  is  known  as  the  fundamental  Wald  identity.  In  the 
literature  it  is  usually  considered  for  a.s.  finite  v  (when  P(v  <  oo)  =  1)  being  in  that 
case  an  extension  of  the  obvious  equality  EeXSn  =  \j/n{k)  to  the  case  of  random  v. 
Originally,  identity  (15.2.26)  was  established  by  A.  Wald  in  the  special  case  where 
v  is  the  exit  time  of  the  sequence  {Sn}  from  a  finite  interval  (see  Corollary  15.2.3), 
and  was  accompanied  by  rather  restrictive  conditions.  Later,  these  conditions  were 
removed  (see  e.g.  [13]).  Below  we  will  obtain  a  more  general  assertion  for  the  prob¬ 
lem  on  the  first  exit  of  the  trajectory  {Sn}  from  a  strip  with  curvilinear  boundaries. 
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Remark  15.2.2  The  fundamental  Wald  identity  shows  that,  although  the  nature  of 
a  stopping  time  could  be  quite  general,  there  exists  a  stiff  functional  constraint 
(15.2.26)  on  the  joint  distribution  of  v  and  Sv  (the  distribution  of  &  is  assumed 
to  be  known).  In  the  cases  where  one  of  these  variables  can  somehow  be  “com¬ 
puted”  or  “eliminated”  (see  Examples  15.2.2-15.2.4)  Wald’s  identity  turns  into  an 
explicit  formula  for  the  Laplace  transform  of  the  distribution  of  the  other  variable. 
If  v  and  Sv  prove  to  be  independent  (which  rarely  happens),  then  (15.2.26)  gives  the 
relationship 

EeXSv  =  [Ei/Kir1']”1 

between  the  Laplace  transforms  of  the  distributions  of  v  and  Sv . 

Proof  of  Theorem  15.2.6  As  we  have  already  noted,  for 

x„  =ekSnif~n(X ),  $„  ?„), 

{Xn,  n>0]  is  a  positive  martingale  with  Xo  =  1  and  EXn  =  1.  Corollary  15.2.2 
immediately  implies  (15.2.24). 

Inequality  (15.2.25)  is  a  consequence  of  Holder’s  inequality  and  (15.2.24): 

E(e(A.A)Sv.  v  <  oo)  =E 

<  [E(fvr/s(X);  v  <  oo)]1/r. 

The  last  assertion  of  the  theorem  (concerning  the  identity  (15.2.26))  follows  from 
Theorem  15.2.4.  □ 

We  now  consider  several  important  special  cases.  Note  that  is  a  convex 
function  (^r"(A)  >  0),  f(0)  =  1,  and  therefore  there  exists  a  unique  point  Ao  at 
which  ^r(A)  attains  its  minimum  value  ^r(Ao)  <  1  (see  also  Sect.  9.1). 

Corollary  15.2.3  Assume  that  we  are  given  a  sequence  g(n)  such  that 

g+(n)  :=  max(0,  g(n))  =  o(n)  as  n  ^  oo. 

If  Sn  <  g(n)  holds  on  the  set  {v  >  n },  then  (15.2.26)  holds  for  X  e  (Ao,  A+]  Li  B , 
B  =  { A  :  VKA)  <  oo}. 

The  random  variable  v  =  vg  =  inf  {A  >  1  :  Sk  >  g(A)}  for  g(k)  =  o(k)  obviously 
satisfies  the  conditions  of  Corollary  15.2.3.  Lor  stopping  times  Vg  one  could  also 
consider  the  case  g(n)/n  c  >  0  as  n  — >  oo,  which  can  be  reduced  to  the  case 
g(n)  =  o(n)  by  introducing  the  random  variables 

k 

!k--=!k-c, 

j= 1 

for  which  vg  =  infjA'  >  1  :  Sk  >  gik)  —  ck}. 


,ISV  \  1  /s 


(A);  v  <  oo 
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Proof  of  Corollary  15.2.3  For  X  >  Aq,  X  e  B,  we  have 


E(v>n(A);  y  > nJ  -  v  < g(«)) 

=  ir~n (X)E(e(x-ko)S"  ■  ex°s 5„  <  g(n)) 

<  ^“"(A)e(A_Ao)?(")E(eA'oS'1;  5„  <  g(n)) 

<  t/f“”a')e(A_Ao)?+('!)EeXoS''  =  ( e(x~Xo)g+(n)  ->  0 

-  V  ) 

as  n  — >  oo,  because  (A  —  Ao)g+(ft)  =  o(n).  It  remains  to  use  Theorem  15.2.6.  The 
corollary  is  proved.  □ 


We  now  return  to  Theorem  15.2.6  for  arbitrary  stopping  times.  It  turns  out  that, 
based  on  the  Cramer  transform  introduced  in  Sect.  9.1,  one  can  complement  its 
assertions  without  using  any  martingale  techniques. 

Together  with  the  original  distribution  P  of  the  sequence  we  introduce  the 

family  of  distributions  of  this  sequence  in  (M°°,  93°°)  (see  Sect.  5.5)  generated 
by  the  finite-dimensional  distributions 


P c  dxjf)  — 


P(£&  £  dxfc), 


n 

Exifk  c  dx i , . . . ,  c  dxn )  —  |  |  P c  dxf) . 

k= 1 


This  is  the  Cramer  transform  of  the  distribution  P. 


Theorem  15.2.7  Let  v  be  an  arbitrary  stopping  time.  Then,  for  any  XeB 


,k.Sv 


Ei 


\!rv(X) 


;  v  <  oo  )  =  P^(v  <  oo), 


(15.2.28) 


Proof  Since  {v  =  n]  e  cr(£i, . . . ,  fw),  there  exists  a  Borel  set  Dn  C  R7\  such  that 

{v  =  n}  =  { (£i , . . . ,  c  Dn  J . 


Further, 


>XSV 


oo 


XSn 


E 


\//v(X) 


;  v  <  oo  )  =  Ye  - ;  v  =  n  ) , 


/?=() 


where 


E 


XSn 


\!rn(X) 


;  V  —  n  = 


)-/ 

/  7(^1,..., 

/ 


,  A.  (jc  i  H - \-xn) 


■P(£d  G  dx\ ,  .  .  .  ,  £72  c  t/v) 


■  ;Xn)€Dn  ^n{X) 

Px(?i  edx„)  =  P;l(v  =  w). 


(xi,...,xn)eA 


This  proves  the  theorem. 


□ 
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For  a  given  function  g(n ),  consider  now  the  stopping  time 

v  =  =  inf{k  :  S*  >  g(£)} 

(cf.  Corollary  15.2.3).  The  assertion  of  Theorem  15.2.7  can  be  obtained  in  that  case 
in  the  following  way.  Denote  by  the  expectation  with  respect  to  the  distribu¬ 
tion  . 

Corollary  15.2.4  1.  If  g+(n)  =  max(0,  g(n))  =  o(n)  as  n  ->  oo  and  A  e 
(Ao,  A+]  fl  t/z^/2  one  has  P^(vg  <  oo)  =  1  in  relation  (15.2.28). 

2  .If  gin)  >  0  ftftd  A  <  Ao,  /7zeft  P^(vg  <  oo)  <  1. 

3.  For  A  =  Ao,  distribution  P^0  of  the  variable  v  can  either  be  proper  (when 
one  has  Pi0(Vg  <  oo)  =  1)  or  improper  (P^0(vg  <  oo)  <  1).  If  Ao  £  (A_,A+), 
g(n)  <  (1  —  £)cr(21oglogft)1//2  for  all  n  >  no,  starting  from  some  no ,  and  a2  = 
Ex0^!2,  //jen  Px(V£  <  oo)  =  1. 

But  if  A  e  (A_,  A_|_),  g(ft)  >  0,  aftd  g(n)  >  (1  +  £)tr(21oglog^)1/2  /or  ft  >  fto, 
t/t^ft  P^.(vg  <  oo)  <  1  (we  exclude  the  trivial  case  ^  =  0). 

Proof  Since  =  yyjjy,  the  expectation  is  of  the  same  sign  as  the  differ¬ 
ence  A  —  Ao,  and  E>,0f&  =  0  (//Ao)  =  0  if  Ao  e  (A_,  A+)).  Hence  the  first  assertion 
follows  from  the  relations 

Px(v  =  oo)  =  Px(Xn  <  g(n)  for  all  n)  <  P(Xn  <  g+(n))  0 

as  ft  — >  oo  by  the  law  of  large  numbers  for  the  sums  Xn  =  J2k=i  ft,  since  Ej^  >  0. 

The  second  assertion  is  a  consequence  of  the  strong  law  of  large  numbers  since 
E  xft  <  0  and  hence  P^(v  =  oo)  =  P(sup7?  Xn  <  0)  >  0. 

The  last  assertion  of  the  corollary  follows  from  the  law  of  the  iterated  logarithm 
which  we  prove  in  Sect.  20.2.  The  corollary  is  proved.  □ 

The  condition  g(n)  >  0  of  part  2  of  the  corollary  can  clearly  be  weakened  to  the 
condition  g(n)  =  o(n ),  P(v  >  n)  >  0  for  any  n  >  0.  The  same  is  true  for  part  3. 

An  assertion  similar  to  Corollary  15.2.4  is  also  true  for  the  (stopping)  time  vg_  ,g+ 
of  the  first  passage  of  one  of  the  two  boundaries  g±(n)  =  o(n ): 

Vg_,g+  :=inf{£  >  1  :  Sk  >  g+(k)  or  Sk  <  g-(k)\. 

Corollary  15.2.5  For  k  e  5\{Ao},  we  have  P^(v^_ ,g+  <  oo)  =  1. 

If  k  =  Ao  €  (A_,  A+),  then  the  distribution  of  v  may  be  either  proper  or  im¬ 
proper. 

If,  for  some  no  >  2, 

g±(n)  ^  ±(1  —  e)o\[2 In lnft 

for  ft  >  fto  then  Pi0(vg_  ^g+  <  oo)  =  1. 

Ifg±(n)  ^  0  and,  additionally, 

g±(n)  ^  ±(1  +  s)crs/  2  In  lnft 

for  ft  >  fto  then  Pi0(vg_  ,g+  <  oo)  <  1. 
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Proof  The  first  assertion  follows  from  Corollary  15.2.4  applied  to  the  sequences 
{±Xn}.  The  second  is  a  consequence  of  the  law  of  the  iterated  logarithm  from 
Sect.  20.2.  □ 


We  now  consider  several  relations  following  from  Corollaries  15.2.3,  15.2.4 
and  15.2.5  (from  identity  (15.2.26))  for  the  random  variables  v  =  vg  and  v  =  vg_  ig+ . 

Let  a  <  0  and  \/f(X+)  >  L  Since  f'(0)  =  a  <  0  and  the  function  i/r(X)  is  convex, 
the  equation  f(X)  =  \  will  have  a  unique  root  /z  >  0  in  the  domain  X  >  0.  Setting 
X  =  /z  in  (15.2.26)  we  obtain  the  following. 


Corollary  15.2.6  If  a  <  0  and  t/t(A+)  >  1  then,  for  the  stopping  times  v  =  vg  and 
v  =  vg_  5<?+,  we  have  the  equality 

YL(e*lSv\  v  <  oo)  =  1. 


Remark  15.2.3  For  an  x  >  0,  put  (as  in  Chap.  10)  rj(x)  :=  inf{k  :  Sk  >  0}.  Since 
S^x)  =  x  +  x  00,  where  /  (jc)  :=  Srj(x)  —  x  is  the  value  of  overshoot  over  the  level  x. 
Corollary  15.2.6  implies 

E(g/*(*+x(*)).  )?(X)  <  oo)  =  1.  (15.2.29) 


Note  that  P(rj(x)  <  oo)  =  P (S  >  x),  where  S  =  sup k>o^k-  Therefore,  Theo¬ 
rem  12.7.4  and  (15.2.29)  imply  that,  as  x  — >  oo, 


e^P(q(x)<oo)  =  [E(e^x{x) 


rj(x)  <  oo)]  1  —>  c. 


(15.2.30) 


The  last  convergence  relation  corresponds  to  the  fact  that  the  limiting  condi¬ 
tional  distribution  (as  x  — >  oo)  G  of  xOO  exists  given  rj(x)  <  oo.  If  we  denote 
by  x  a  random  variable  with  the  distribution  G  then  (15.2.30)  will  mean  that 
c  =  [Ee^x]~{  <  1.  This  provides  an  interpretation  of  the  constant  c  that  is  different 
from  the  one  in  Theorem  12.7.4. 


In  Corollary  15.2.6  we  “eliminated”  the  “component”  ^fv(X)  in  identity  (15.2.26). 
“Elimination”  of  the  other  component  eXSv  is  possible  only  in  some  special  cases  of 
random  walks,  such  as  the  so-called  skip-free  walks  (see  Sect.  12.8)  or  walks  with 
exponentially  (or  geometrically)  distributed  ^  =  max(0,  f&)  or  =  —  min(0,  f&). 
We  will  illustrate  this  with  two  examples. 


Example  15.2.3  We  return  to  the  ruin  problem  discussed  in  Example  15.2.2.  In  that 
case,  Corollary  15.2.4  gives,  for  g~(n)  :=  —z\  and  g+(n)  =  Z2,  that 

eAz2E(i A(A)“V;  Sv  =  z2)  +  e~XziE (^(A)“v;  =  -zi)  =  1. 


In  particular,  for  z\  =  Z2  =  z  and  p  =  1/2,  we  have  by  symmetry  that 


2 


E v;  Sv  =  z) 


eXz  +  e  Xz’ 


E(^(A)-v) 


eXz  +  e  Xz 


(15.2.31) 
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Let  A (s)  be  the  unique  positive  solution  of  the  equation  =  1,  s  e  (0,  1).  Since 

here  \f{k)  =  ^{ex  +  e~x),  solving  the  quadratic  equation  yields 


e 


1  +  \f  \  —  s'1 

s 


Identity  (15.2.31)  now  gives 

Esv  =  2(eX{s)z  +  e~X{s)z). 


We  obtain  an  explicit  form  of  the  generating  function  of  the  random  variable  v, 
which  enables  us  to  find  the  probabilities  P(v  =n),  n  =  1, 2, ...  by  expanding  ele¬ 
mentary  functions  into  series. 


Example  15.2.4  Simple  explicit  formulas  can  also  be  obtained  from  Wald’s  identity 
in  the  problem  with  one  boundary,  where  v  =  vg,  g(n)  =  z.  In  that  case,  the  class  of 
distributions  of  ^  could  be  wider  than  in  Example  15.2.3.  Suppose  that  one  of  the 
two  following  conditions  holds  (cf.  Sect.  12.8). 

1.  The  transform  walk  is  arithmetic  and  skip-free,  i.e.  are  integers,  P(^  =  1)  >  0 
and  P(ffc  >  2)  =  0. 

2.  The  walk  is  right  exponential,  i.e. 

P  fa  >  t)  =  ce-at  (15.2.32) 

either  for  all  t  >  0  or  for  t  =  0,  1,  2, . . .  if  the  walk  is  integer-valued  {the  geo¬ 
metric  distribution). 

The  random  variable  vg  will  be  proper  if  and  only  if  E§£  =  t/L(0)  >  0  (see 
Chaps.  9  and  12).  For  skip-free  random  walks,  Wald’s  identity  (15.2.26)  yields 
(g(n)  =  z  >  0,  Sv  =z) 

eXzEf~v(k)  =  \,  X  >  A0.  (15.2.33) 

For  s  <  1,  the  equation  i/y  ( a )  =  ,v~ 1  (cf.  Example  15.2.3)  has  in  the  domain  a  >  Ao 
a  unique  solution  \{s).  Therefore  identity  (15.2.33)  can  be  written  as 

Esv  =e“^(s).  (15.2.34) 

This  statement  implies  a  series  of  results  from  Chaps.  9  and  12.  Many  properties 
of  the  distribution  of  v  :=  vz  can  be  derived  from  this  identity,  in  particular,  the 
asymptotics  of  P(vz  =  n)  as  z  — >  oo,  n  — >  oo.  We  already  know  one  of  the  ways  to 
find  this  asymptotics.  It  consists  of  using  Theorem  12.8.4,  which  implies 

P(vz=n)  =  -P(Sn=z),  (15.2.35) 

n 

and  the  local  Theorem  9.3.4  providing  the  asymptotics  of  P (Sn  =  z).  Using  rela¬ 
tion  (15.2.34)  and  the  inversion  formula  is  an  alternative  approach  to  studying  the 
asymptotics  of  P(vz  =  n).  If  we  use  the  inversion  formula,  there  will  arise  an  integral 
of  the  form 

f  s-ne~z^s)ds , 

J\s\  =  1 


(15.2.36) 
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where  the  integrand  s  11  e  z^s\  after  the  change  of  variable  /i(s)=X  (or  s  = 
\j/  (A) - 1 ),  takes  the  form 

exp  —  {zA  —  n  lnt/f(A)}. 

The  integrand  in  the  inversion  formula  for  the  probability  P (Sn  =  z)  has  the  same 
form.  This  probability  has  already  been  studied  quite  well  (see  Theorem  9.3.4);  its 
exponential  part  has  the  form  e~nA^a\  where  a  =  z/n,  A(a)  =  sup^aA  —  In  t //-(A)) 
is  the  large  deviation  rate  function  (see  Sect.  9.1  and  the  footnote  for  Defini¬ 
tion  9.1.1).  A  more  detailed  study  of  the  inversion  formula  (15.2.36)  allows  us  to 
obtain  (15.2.35). 

Similar  relations  can  be  obtained  for  random  walks  with  exponential  right  dis¬ 
tribution  tails.  Let,  for  example,  (15.2.32)  hold  for  all  t  >  0.  Then  the  conditional 
distribution  P (Sv  >  t\v  =  n,  Sn-\  =  x )  coincides  with  the  distribution 

P  (Cn  >  Z  -  x  +  1 1£„  >z—x)  =  e~at 

and  clearly  depends  neither  on  n  nor  on  v .  This  means  that  v  and  Sv  are  independent, 

Sv  =z  +  y,  y 

E f(X)~v  = — — — T=e~Xz - ,  Ao  <  A  <  a;  Esy  =  e~zk(s) - — , 

y  E  e(z+y>x  a  a 

where  A(.s)  is,  as  before,  the  only  solution  to  the  equation  i//  (  a  )  =  s  ~  in  the  domain 
A  >  /.().  This  implies  the  same  results  as  (15.2.34). 

If  P(&  >  t)  =  c\e~at  and  P (&  <  —t)  =  c^e  ^ ,  t  >  0,  then,  in  the  problem  with 
two  boundaries,  we  obtain  for  v  =  vg_^g+,  g+(n)  =  zi  and  g-(n)  =  —z  1  in  exactly 
the  same  way  from  (15.2.26)  that 

oce^zl  i Qc  ^zi 

- -E  {if~v(xy,  Sv  >  Z2)  +  —  E  {if~v(xy,  Sv  <  -zi)  =  l,  A  e  (-13,  a). 

OL  —  A  p  T  A 
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15.3.1  In  equalities  for  Martingales 


First  of  all  we  note  that  the  property  EXn  <  1  of  the  sequence  Xn  =  eXSn \jro(X)~n 
forming  a  supermartingale  for  an  appropriate  function  t/q)(A)  remains  true  when  we 
replace  n  with  a  stopping  time  v  (an  analogue  of  inequality  (15.2.24))  in  a  much 
more  general  case  than  that  of  Theorem  15.2.6.  Namely,  may  be  dependent. 

Let,  as  before,  {Sn}  be  an  increasing  sequence  of  a -algebras,  and  be 
3^ -measurable  random  variables.  Suppose  that  a.s. 


Sn- 1)  <  ^oW* 


(15.3.1) 


This  condition  is  always  met  if  a.s. 

P (Xn  >x\Sn-\)  <  G(X), 


= 


J  eXx  dG(x)  <  oo. 
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In  that  case  the  sequence  Xn  =  eXSn "(A)  forms  a  supermartingale: 

E(Xn\$n-i)<Xn-U  EXn<l. 

Theorem  15.3.1  Let  (15.3.1)  hold  and  v  he  a  stopping  time.  Then  inequalities 
(15.2.24)  and  (15.2.25)  will  hold  true  with  x/s  replaced  by  x/s o. 

The  Proof  of  the  theorem  repeats  almost  verbatim  that  of  Theorem  15.2.6.  □ 

Now  we  will  obtain  inequalities  for  the  distribution  of 

X n  =  max Xk  and  X *  =  max  \Xk\, 

k<n  k<n 

Xn  being  an  arbitrary  submartingale. 

Theorem  15.3.2  (Doob)  Let  {Xn,$n;  n  >  0}  be  a  nonnegative  submartingale. 
Then,  for  all  x  >  0  and  n  >  0, 

P(Xn  >  x)  <  -EXn. 

x 


Proof  Let 

v  =  r](x)  :=  inf [k  >  0  :  Xk  >  x},  v(n)  :=  min(v,  n). 

It  is  obvious  that  n  and  v(n)  are  stopping  times,  v(n)  <  n,  and  therefore,  by  Theo¬ 
rem  15.2.1  (see  (15.2.3)  for  V2  =  n,v\  =  v(n)), 

EXn  >EXv{n). 

Observing  that  {Xn  >  x]  =  { Xv(n )  >  x},  we  have  from  Chebyshev’s  inequality  that 

P(Xn  >x)  =  P(Xv{n)  >  x)  <  -EXV(n)  <  -EXn. 

X  X 

The  theorem  is  proved.  □ 

Theorem  15.3.2  implies  the  following. 

Theorem  15.3.3  (The  second  Kolmogorov  inequality)  Let  {Xn ,  $n ;  n  >  0}  be  a 
martingale  with  a  finite  second  moment  EXf2.  Then  {Xf2,  $n;  n  >  0}  is  a  submartin¬ 
gale  and  by  Theorem  15.3.2 

p(K  >x)<  LEx2n. 

A 

Originally  A.N.  Kolmogorov  established  this  inequality  for  sums  Xn  =  + 

- h  of  independent  random  variables  § n .  Theorem  15.3.3  extends  Kolmogorov’s 

proof  to  the  case  of  submartingales  and  refines  Chebyshev’s  inequality. 

The  following  generalisation  of  Theorem  15.3.3  is  also  valid. 
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Theorem  15.3.4  If  {Xn,  $n;  n  >  0}  is  a  martingale  and  \L\Xn\p  <  oo,  p  >  1,  then 
{\Xn\P,3n  ;  n  >  0 }  forms  a  nonnegative  submartingale  and,  for  all  x  >  0, 

V{X:>x)<fnXn\P- 

If{Xn,  $n;  n  >  0}  is  a  submartingale ,  EeXXn  <  oo,  A  >  0,  then  {eXXn  ,$n;  n  >  0} 
also  forms  a  nonnegative  submartingale , 

P(XW  >  jc)  <  e^E^*" . 


Both  Theorem  15.3.4  and  Theorem  15.3.3  immediately  follow  from  Lem¬ 
ma  15.1.3  and  Theorem  15.3.2. 

If  Xn  =  Sn  =  J2k=\  £&>  where  are  independent,  identically  distributed  and 
satisfy  the  Cramer  condition:  A+  =  sup{A  :  t /r(A)  <  oo}  >  0,  then,  with  the  help  of 
the  fundamental  Wald  identity,  one  can  obtain  sharper  inequalities  for  F(Xn  >  x)  in 
the  case  a  =  <  0. 

Recall  that,  in  the  case  a  =  0)  <  0,  the  function  =  EeXipk  decreases  in  a 

neighbourhood  of  A  =  0,  and,  provided  that  ^r(A+)  >  1,  the  equation  ^r(A)  =  1  has 
a  unique  solution  /x  in  the  domain  A  >  0. 

Let  f  be  a  random  variable  having  the  same  distribution  as  .  Put 


t/f+  ;=  supE^^ 

t>0 


?  >0* 


If,  for  instance,  P(f  >  t)  =  ce  at  for  t  >  0  (in  this  case  necessarily  a  >  /a  in 
(15.2.32)),  then 


t  >  v\£  >  t)  = 


P(£  >t  +  v) 

P(?  >  0 


^r_l_  = 


a 


a  —  p 


A  similar  equality  holds  for  integer- valued  §  with  a  geometric  distribution. 

For  other  distributions,  one  has  \j/+  >  t//_. 

Under  the  above  conditions,  one  has  the  following  assertion  which  supplements 
Theorem  12.7.4  for  the  distribution  of  the  random  variable  S  =  sup^  Sk. 


Theorem  15.3.5  If  a  =  E£  <0  then 

<V{S>x)<fZle~p'x,  x  >  0.  (15.3.2) 


This  theorem  implies  that,  in  the  case  of  exponential  right  tails  of  the  distribution 
of  f  (see  (15.2.32)),  inequalities  (15.3.2)  become  the  exact  equality 

a  —  a 

P  (S>x)  = - -e~^x. 

a 

(The  same  result  was  obtained  in  Example  12.5.1.)  This  means  that  inequalities 
(15.3.2)  are  unimprovable.  Since  Sn  =  ma Xk<n  Sk  <  S ,  relation  (15.3.2)  implies 
that,  for  any  n , 

P (Sn  >x)<  ifZle~^x. 
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Proof  of  Theorem  15.3.5  Set  v  :=  oo  if  S  =  sup k>0Sk  <  x,  and  put  v  :=  rj(x)  = 
min{k  :  Sk  >  x]  otherwise.  Further,  let  x(x)  •=  Sri(x)  ~  x  be  the  excess  of  the 
level  v .  We  have 


P(xOO  >  v; 


P(Sk-i  <  x,  Sk- 1  e  du,  &  >  x 
P(Sjt-i  <  x,  Sk- 1  €  du,  Kk>  x 


x  P(&  >  x  -  u  +  v\t;k  >  x  -  u). 


V  <  00 


oo  r 

)^E/ 

*=1 


P(*S*-1  <  S*-l  G  dw,  tk>x 


— oo 
oo 


=  V/'+  P(v  =  k)  =  iA+P(v  <  oo). 


fc=l 


M  +  U) 


m)^+ 


Similarly, 

E(^/XX(X);  V  <  oo)  >  ^r_P(v  <  oo). 


Next,  by  Corollary  15.2.6, 

1  =  E(e^Sv\  v  <  oo)  =  e^xE(e^x^;  v  <  oo)  <  etJ'x\lr+  P(v  <  oo). 

Because  P(v  <  oo)  =  F(S  >  x),  we  get  from  this  the  right  inequality  of  The¬ 
orem  15.3.5.  The  left  inequality  is  obtained  in  the  same  way.  The  theorem  is 
proved.  □ 


Remark  15.3.1  We  proved  Theorem  15.3.5  with  the  help  of  the  fundamental  Wald 
identity.  But  there  is  a  direct  proof  based  on  the  following  relations: 

n 

x lrn(X)  =  EeAS"  >  y^E(e(s*+Sn_St)A;  v  =  k) 

k= 1 

=  y^E(e(x+xW)Ae(s"“^)A;  v  =  k).  (15.3.3) 

k=  1 

Here  the  random  variables  eXx^I(v  =  k)  and  Sn  —  Sk  are  independent  and,  as  be¬ 
fore, 

E(eXx{x) ;  v  =  k)  >^-P (v  =  k). 

Therefore,  for  all  A  such  that  <  1, 

n 

tn(k)>eXxt-^tn-k(k)V{v  =  k)  >  t-eXxfn(\)P(v  <n). 

k=  1 


P(S„  >  x)  =  P(v  <  n)  <  xl/ZXe~Xx. 


Hence  we  obtain 
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Since  the  right-hand  side  does  not  depend  on  n ,  the  same  inequality  also  holds  for 
P (S  >  x).  The  lower  bound  is  obtained  in  a  similar  way.  One  just  has  to  show  that, 
in  the  original  equality  (cf.  (15.3.3)) 

n 

fW=yE(eix";  v  =  k)+  E(ekSn;  v>n), 
k=  1 

one  has  E(eXSn ;  v  >  n)  =  o(  1)  as  n  — >  oo  for  X  =  /r,  which  we  did  in  Sect.  15.2. 


15.3.2  Inequalities  for  the  Number  of  Crossings  of  a  Strip 


We  now  return  to  arbitrary  submartingales  Xn  and  prove  an  inequality  that  will  be 
necessary  for  the  convergence  theorems  of  the  next  section.  It  concerns  the  number 
of  crossings  of  a  strip  by  the  sequence  Xn.  Let  a  <  b  be  given  numbers.  Set  vo  =  0, 

vi  :=min{ft  >  0  :  Xn  <  a},  v 2  :=min{n  >  v\  :  Xn  >  b}, 


V2k-t  :=min{n  >  vlk- 2  :  Xn  <  a],  vlk  :=min{n  >  v^-i  :  Xn  >  b). 


We  put  vm  :=  00  if  the  path  {Xn}  for  n  >  vm_i  never  crosses  the  corresponding 
level.  Using  this  notation,  one  can  define  the  number  of  upcrossings  of  the  strip 
(interval)  [< a ,  /?]  by  the  trajectory  Xq,  . . . ,  Xn  as  the  random  variable 


v(a,  b\  n) 


max{^  :  v^k  S  n}  if  V2  <n, 
0  if  V2  >  n. 


Set  (a)+  =  max(0,  a). 


Theorem  15.3.6  (Doob)  Let  {Xn,  $n\  n  >0}  be  a  submartingale.  Then,  for  all  n, 

E(Xn  -  a)+ 

E  v(a,b;n)<—^ - — .  (15.3.4) 

b  —  a 

It  is  clear  that  inequality  (15.3.4)  assumes  by  itself  that  only  the  submartingale 
{Xn,  $n\  0  <  k  <  n }  is  given. 

Proof  The  random  variable  v(a,b\  n)  coincides  with  the  number  of  upcrossings  of 
the  interval  [0,  b  —  a]  by  the  sequence  (Xn  —  a)+.  Now  {( Xn  —  a)+,  $n;  n  >  0} 
is  a  nonnegative  submartingale  (see  Example  15.1.4)  and  therefore,  without  loss  of 
generality,  one  can  assume  that  a  =  0  and  Xn  >  0,  and  aim  to  prove  that 

EXn 

Ev(0 ,b;  n )  <  — -. 

b 


if  Vk  <  j  <  Vk+\  for  some  odd  k , 
if  Vk  <  j  <  Vk+\  for  some  even  k. 


Let 
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Fig.  15.1  Illustration  to  the 
proof  of  Theorem  15.3.6 
showing  the  locations  of  the 
random  times  v\,  V2,  and  V3 
(here  a  =  0) 


In  Fig.  15.1,  v\  =  2,  V2  =  5,  V3  =  8;  qj  =  0  for  j  <  2,  rjj  =  1  for  3  <  j  <  5  etc. 
It  is  not  hard  to  see  (using  the  Abel  transform)  that  (with  Xo  =  0,  tjq  =  1) 

n  n  —  1 

rioXo  -\-'Y^ilj(Xj  -Xj- 1)  =  £>(„,  -  rjj+i)  +  r]nXn  >  bv(0,b;n). 

1  0 

Moreover  (here  N 1  denotes  the  set  of  odd  numbers), 

[r)j  =  1}  =  (J  {vk<  j  <  vk+i]  =  |J  [{ vk  <j  -  1}  -  {v*+i  <  j  -  1}]  6  37-1- 

ke'N  1  /ce2\fi 

Therefore,  by  virtue  of  the  relation  E (Xj  \3j-i)  —  X;-_i  >  0,  we  obtain 


n 


n 


bEv(0,b;  n )  <E -Xj- 1)  =  ^E(X7-  -Xj- 1;  ^  =  1) 


n 


n 


=  EE[E(V  -  V-il^-i);  rij  =  1]  =  EE[E(VI^-i)  -  Xj-i;  r,j  =  1] 


1 

n 


n 


<  EE[E(X,|3y-i)  -  Xj-i]  =  EE(V'  -  Xj-l)=EXn 


The  theorem  is  proved. 


□ 


15.4  Convergence  Theorems 

Theorem  15.4.1  (Doob’s  martingale  convergence  theorem)  Let 

{Xn,$n;  -oo  <  n  <  00} 


be  a  submartingale.  Then 

(1)  The  limit  X _oo  :=  lim^^-ooX^  exists  a.s .,  EXl^  <  00,  and  the  process 
{Xn,  $n;  —00  <n<  00}  is  a  submartingale. 

(2)  If  sup n  EX+  <  00  then  X ^  :=  lim^^oo  Xn  exists  a.s.  and  EX+  <  00.  If  more¬ 
over, ;  sup/;  E|XW|  <  00  then  E|Xoo|  <  00. 

(3)  The  random  sequence  {Xn,$n;  —00  <n<  00}  forms  a  submartingale  if  and 
only  if  the  sequence  {X+}  is  uniformly  integrable. 
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Proof  (1)  Since 

{lim  sup  Xn  >  lim  inf  Xw}  =  {lim  sup  Xw  >  b  >  a  >  lim  inf  Xw} 

rational 

a,b 

(here  the  limits  are  taken  as  n  — >  —  oo),  the  assumption  on  divergence  with  positive 
probability 

P(limsupXn  >  lim  inf  Xw)  >  0 
means  that  there  exist  rational  numbers  a  <  b  such  that 


P(limsupXw  >  b  >  a  >  liminfXw)  >  0.  (15.4.1) 

Let  v(a,b\  m)  be  the  number  of  upcrossings  of  the  interval  [a,  b]  by  the  sequence 
Y\  =  X-m, . . . ,  Ym  =  X-i  and  v(a,  b)  =  lim^^oo  v(a,  b;  m).  Then  (15.4.1)  means 
that 


P (v(a,  b)  =  oo)  >  0. 

By  Theorem  15.3.6  (applied  to  the  sequence  Y\,  Ym), 


„  ,  ,  ^EX+  +|a 

E via,  b\  m)  < - < 


E v(a,  b)  < 


b  —  a  b  —  a 

EX+x  +  \a\ 


b  —  a 


(15.4.2) 


(15.4.3) 

(15.4.4) 


Inequality  (15.4.4)  contradicts  (15.4.2)  and  hence  proves  that 

P(limsupXw  =  lim  inf  Xn)  =  1. 

Moreover,  by  the  Fatou-Lebesgue  theorem  (X^  :=  lim  inf  X+), 

EX^  <  lim inf  X+  <  EX+x  <  oo.  (15.4.5) 

Here  the  second  inequality  follows  from  the  fact  that  {X+,  $n}  is  also  a  submartin¬ 
gale  (see  Lemma  15.1.3)  and  therefore  EX+  j\ 

By  Lemma  15.1.2,  to  prove  that  [Xn,  — oo  <n<  oo}  is  a  submartingale,  it 

suffices  to  verify  that,  for  any  A  g  5_oo  C  5, 


E(X_oo;  A)<  E(X„;  A). 


(15.4.6) 


Set  Xn(a)  :=  ma x(Xn,a).  By  Lemma  15.1.4,  {Xn(a),$n;  n  <  0}  is  a  uniformly 
integrable  submartingale.  Therefore,  for  any  —  oo  <k  <n, 

E (X*(a);  A)  <  E(X„(a);  A), 

E(X_oo(a);  A)  =  lim  E(X*(a);  A)  <  E(X„(a);  A).  (15-4.7) 

Letting  a  — >►  —  oo  we  obtain  (15.4.6)  from  the  monotone  convergence  theorem. 

(2)  The  second  assertion  of  the  theorem  is  proved  in  the  same  way.  One  just  has 
to  replace  the  right-hand  sides  of  (15.4.3)  and  (15.4.4)  with  EX+  and  sup„  EX+, 
respectively.  Instead  of  (15.4.5)  we  get  (the  limits  here  are  as  n  — >►  oo) 

EX^  <  lim  inf  EX +  <  oo, 
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and  if  sup/;  E|XW  |  <  oo  then 

E|Xoo|  <  liminfE|Xn|  <  oo. 

(3)  The  last  assertion  of  the  theorem  is  proved  in  exactly  the  same  way  as  the 
first  one — the  uniform  integrability  enables  us  to  deduce  along  with  (15.4.7)  that, 
for  any  A  e  $n , 

E(Xoo(a);  A)  =  lim  E(X*(a);  A)  >E (X„(a);  A). 

k—^oo 

The  converse  part  of  the  third  assertion  of  the  theorem  follows  from  Lemma  15.1.4. 
The  theorem  is  proved.  □ 

Now  we  will  obtain  some  consequences  of  Theorem  15.4.1. 

So  far  (see  Sect.  4.8),  while  studying  convergence  of  conditional  expectations, 
we  dealt  with  expectations  of  the  form  E(XW|3).  Now  we  can  obtain  from  Theo¬ 
rem  15.4.1  a  useful  theorem  on  convergence  of  conditional  expectations  of  another 
type. 

Theorem  15.4.2  (Levy)  Let  a  nondecreasing  family  gq  c  ^  •  •  •  S=  $  of  o- 
algebras  and  a  random  variable  §,  with  E|§  |  <  00,  be  given  on  a  probability  space 
(T2,  P).  Let ,  as  before ,  3bo  :=  cr(|J^  #„)  be  the  a -algebra  generated  by  events 
from  ,  $2,  •  •  •  •  Then,  as  n  — >  00, 

E(§|3>z)  —>  E^l^oo).  (15.4.8) 

Proof  Set  Xn  :=  E(§  1 #„).  We  already  know  (see  Example  15.1.3)  that  the  sequence 
{Xn,  $n;  1  <  n  <  00}  is  a  martingale  and  therefore,  by  Theorem  15.4.1,  the  limit 
lim^oo  Xn  =  X(oo)  exists  a.s.  It  remains  to  prove  that  X^)  =  E(§|3oo)  (i.e.,  that 
X(oo)  =  Xqo).  Since  {Xn,$n;  1  <  n  <  00}  is  by  Lemma  15.1.4  a  uniformly  inte- 
grable  martingale, 

E(X(oo);  A)  =  lim  E(X„;  A)  =  lim  E(E($|ff„);  A)=E(£;  A) 

for  A  e  $k  and  any  k  —  1,2,...  This  means  that  the  left-  and  right-hand  sides  of 
the  last  relation,  being  finite  measures,  coincide  on  the  algebra  U7^i  Bn-  By  the 
theorem  on  extension  of  a  measure  (see  Appendix  1),  they  will  coincide  for  all 
A  e  or(U7^i  $n)  =  £00.  Therefore,  by  the  definition  of  conditional  expectation, 

A  (00)  =  E(§l3oo)  —  2foo- 

The  theorem  is  proved.  □ 

We  could  also  note  that  the  uniform  integrability  of  {Xn ,  $n ;  1  <  n  <  00}  implies 

that  in  (47)  can  be  replaced  by  > . 

Theorem  15.4.1  implies  the  strong  law  of  large  numbers.  Indeed,  turn  to  our  Ex¬ 
ample  15.1.4.  By  Theorem  15.4.1,  the  limit  X-^  =  Xn  =  lim^oo  n~l  Sn 

exists  a.s.  and  is  measurable  with  respect  to  the  tail  (trivial)  a -algebra,  and  therefore 

it  is  constant  with  probability  1.  Since  EX-oq  =  E£i,  we  have  n~  Sn  —A  E§q. 
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One  can  also  obtain  some  extensions  of  the  theorems  on  series  convergence  of 
Chap.  11  to  the  case  of  dependent  variables.  Let 

n 

x„  =  s„  =  %k 

k=  1 

and  Xn  form  a  submartingale  (E(§n+i|3>7)  >  0).  Let,  moreover,  E|XW|  <  c  for  all 
n  and  for  some  c  <  oo.  Then  the  limit  Sqq  =  lim^oo  Sn  exists  a.s.  (As  well  as 
Theorem  15.4.1,  this  assertion  is  a  generalisation  of  the  monotone  convergence  the¬ 
orem.  The  crucial  role  is  played  here  by  the  condition  that  E|X„|  is  bounded.)  In 
particular,  if  ^  are  independent,  E§&  =  0,  and  the  variances  a ^  of  ^  are  such  that 
o*  <a2  <  oo,  then 

In  \  1/2 

E\Xn\ <  (1 EX2n)l/ 2  <  (  y>2j  <  a  <  oo, 

d  s 

and  therefore  Sn  — ->  S0 0 .  Thus  we  obtain,  as  a  consequence,  the  Kolmogorov  theo¬ 
rem  on  series  convergence. 


Example  15.4.1  Consider  a  branching  process  {Zn}  (see  Sect.  7.7).  We  know  that 
Zn  admits  a  representation 

Zn  =  fi  H - f  fz„_ 1 , 

where  the  ^  are  identically  distributed  integer- valued  random  variables  independent 
of  each  other  and  of  Zn-\ ,  ^  being  the  number  of  descendants  of  the  k- th  particle 
from  the  ( n  —  l)-th  generation.  Assuming  that  Zq  =  1  and  setting  p  :=  E^,  we 
obtain 


E(Z77|Z77_i)  —  pZn—\ ,  EZn  —  /xEZ^—i  —  /x  . 
This  implies  that  Xw  =  Zn/ \in  is  a  martingale,  because 

E(X„|Z„_,)  =  m1-'!Z„_i  =X„_i. 


For  branching  processes  we  have  the  following. 

Theorem  15.4.3  The  sequence  Xn  =  ji~nZn  converges  almost  surely  to  a  proper 
random  variable  X  with  EX  <  oo.  The  ch.f.  <p(X)  of  the  random  variable  X  satisfies 
the  equation 

(piilX)  =  p(<p(X)), 

where  p(v)  =  EiAL 

Theorem  15.4.3  means  that  pL~nZn  has  a  proper  limiting  distribution  as  n  — >  oo. 

Proof  Since  Xn  >  0  and  EXn  =  1,  the  first  assertion  follows  immediately  from 
Theorem  15.4.1. 
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Since  E zZn  is  equal  to  the  n- th  iteration  of  the  function  f(z),  for  the  ch.f.  of  Zn 
we  have  (<^(A)  :=  Eelkr>) 


<PznW  =  p(<pznr.iW), 

<PxnW  =  <Pz„  (m“”a)  =  p(<Pz„_,  =  pCpxnr,1 


Because  Xn  =>>  X  and  the  function  p  is  continuous,  from  this  we  obtain  the  equation 
for  the  ch.f.  of  the  limiting  distribution  X : 


The  theorem  is  proved. 


(p(X)  =  p 


□ 


In  Sect.  7.7  we  established  that  in  the  case  /x  <  1  the  process  Zn  becomes  extinct 
with  probability  1  and  therefore  P(X  =  0)  =  1 .  We  verify  now  that,  for  fi  >  1 ,  the 
distribution  of  X  is  nondegenerate  (not  concentrated  at  zero).  It  suffices  to  prove 
that  [Xn,  0  <  n  <  oo}  forms  a  martingale  and  consequently 

EX  =  EXn  /  0. 

By  Theorem  15.4.1,  it  suffices  to  verify  that  the  sequence  Xn  is  uniformly  integrable. 
To  simplify  the  reasoning,  we  suppose  that  Var(f&)  =  a2  <  oo  and  show  that  then 
EX2  <  c  <  oo  (this  certainly  implies  the  required  uniform  integrability  of  Xn ,  see 
Sect.  6.1).  One  can  directly  verify  the  identity 

z„2  -iiln  =  Yj  [Z2  -  (piZk-l)2]p2n-2k . 

k= 1 

Since  E[Z2  —  (/xZ^_i)2|Z^_i]  =  a2Zk-\  (recall  that  Var(?7)  =  E(r]2  —  (E77)2)),  we 
have 


n 


Var (Z„)  =  E(Z2  -  M2")  =  ^^"“^EZjt-i 


fc=l 


2n  2 

=  11  a 


E 


!  _  0-V!O"  -  1) 
- 1) 


M 


&=1 


=  m“2"EZ^  =  1  + 


er2(l  —  /U.-”) 


<  1  + 


cr 


/x(/x  —  1)  /x(/x  —  1) 

Thus  we  have  proved  that  X  is  a  nondegenerate  random  variable, 

o 2 

EX  =  1,  Var(Xn)- 


/x(/x  -  1) 


From  the  last  relation  one  can  easily  obtain  that  Var(X)  =  .  To  this  end  one 

can,  say,  prove  that  Xn  is  a  Cauchy  sequence  in  mean  quadratic  and  hence  (see 

Theorem  6.1.3)  Xn  >  X. 
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15.5  Boundedness  of  the  Moments  of  Stochastic  Sequences 

When  one  uses  convergence  theorems  for  martingales,  conditions  ensuring  bound¬ 
edness  of  the  moments  of  stochastic  sequences  {Xn,$n}  are  of  significant  interest 
(recall  that  the  boundedness  of  EXn  is  one  of  the  crucial  conditions  for  convergence 
of  submartingales).  The  boundedness  of  the  moments,  in  turn,  ensures  that  Xn  is 
stochastically  bounded,  i.e.,  that  sup7?  E(Xn  >  N)  ->  0  as  N  — >  oo.  The  last  bound¬ 
edness  is  also  of  independent  interest  in  the  cases  where  one  is  not  able  to  prove,  for 
the  sequence  {Xn},  convergence  or  any  other  ergodic  properties. 

For  simplicity’s  sake,  we  confine  ourselves  to  considering  nonnegative  sequences 
Xn  >0.  Of  course,  if  we  could  prove  convergence  of  the  distributions  of  Xn  to  a 
limiting  distribution,  as  was  the  case  for  Markov  chains  or  submartingales  in  The¬ 
orem  15.4.1,  then  we  would  have  a  more  detailed  description  of  the  asymptotic 
behaviour  of  Xn  as  n  — >  oo.  This  convergence,  however,  requires  that  the  sequence 
Xn  satisfies  stronger  constraints  than  will  be  used  below. 

The  basic  and  rather  natural  elements  of  the  boundedness  conditions  to  be  con¬ 
sidered  below  are:  the  boundedness  of  the  moments  of  ^n  —  Xn—  Xn-\  of  the  re¬ 
spective  orders  and  the  presence  of  a  negative  “drift”  E{fn\$n-\)  in  the  domain 
Xn-\  >  N  for  sufficiently  large  N.  Such  a  property  has  already  been  utilised  for 
Markov  chains;  see  Corollary  13.7.1  (otherwise  the  trajectory  of  Xn  may  go  to  oo). 

Let  us  begin  with  exponential  moments.  The  simplest  conditions  ensuring  the 
boundedness  of  sup„  EeXXn  for  some  X  >  0  are  as  follows:  for  all  n  >  1  and  some 
X  >  0  and  N  <  oo, 


E(e^n 

E(e^n 


$n-l)l(Xn-i  >  N)<p(X)  <  1, 
3n_l)l(X„_i  >  N)  <  V'V)  <  OO. 


Theorem  15.5.1  If  conditions  (15.5.1)  and  (15.5.2)  hold  then 


E(exx»  |3o)  <  P(X)exx°  +  -j- 


f(X)  e 


XN 


-m 


(15.5.1) 

(15.5.2) 


(15.5.3) 


Proof  Denote  by  An  the  left-hand  side  of  (15.5.3).  Then,  by  virtue  of  (15.5.1)  and 
(15.5.2),  we  obtain 


An  =E{E[eAX"(l(X„_1  >  A0+I(X„_i  <  N)) |^o} 

<  E[eXX"-'  (P(k)  I(Xn—i  >N)  +  I(Xn—\  <  N))  |ff0] 

<  P(X)An—\  +  ex^ (X) . 

This  immediately  implies  that 

n  —  1  I  /<\\ 

An  <  A0pn(X)+eXNf(X)YPk(O  <  AoPn(X)  +  ,  ¥)  [. 

1  “  m 


The  theorem  is  proved. 


□ 
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The  conditions 


E(§n|3>2-i)  <  —  £  <  0  on  the  co-set 
E(ex^  $n_i)  <  <  oo  for 


{Xn-i  >  N}, 
some  X  >  0 


(15.5.4) 

(15.5.5) 


are  sufficient  for  (15.5.1)  and  (15.5.2). 

The  first  condition  means  that  Yn  :=  ( Xn  +  sn)  \(Xn-\  >  N)  is  a  supermartin¬ 
gale. 

We  now  prove  sufficiency  of  (15.5.4)  and  (15.5.5).  That  (15.5.2)  holds  is  clear. 
Further,  make  use  of  the  inequality 

r2 

ex  <l+x  +  — 

2 

which  follows  from  the  Taylor  formula  for  ex  with  the  remainder  in  the  Cauchy 
form: 

x2 

ex  =  i+x  +  —  e6x,  6  e  [0, 1], 

Then,  on  the  set  {Xn-\  >  N},  one  has 


E(ex^\dn-i)  <  1  -  ^e  + 

Since  x2  <  eXx /2  for  sufficiently  large  x,  by  the  Holder  inequality  it  follows  that, 
together  with  (15.5.5),  we  will  have 

EfeV?"l/2|5n-l  )<foW<00. 


This  implies  that,  for  sufficiently  small  A,  one  has  on  the  set  {Xn-\  >  A^}  the  in¬ 
equality 


Sn- 1)  <  1  —  Xs  + 


xlf2(X)=ffi(X)<\ 


This  proves  (15.5.1). 


Xs 

—  <  1. 
2 


□ 


Corollary  15.5.1  If  in  addition  to  the  conditions  of  Theorem  15.5.1,  the  distribution 
ofXn  converges  to  a  limiting  distribution :  ¥(Xn  <t)=^  P(X  <  t ),  then 


kx  <  egUrW 

“i  -m 


The  corollary  follows  from  the  Fatou-Lebesgue  theorem  (see  also  Lemma  6.1.1): 

.IX  /  I*  •  fi7  ii 


Ee  <  liminfEe,^n. 

n—^oo 

We  now  obtain  bounds  for  “conventional”  moments.  Set 

Ml(n)  :=EXln, 

m{ 0)  :=  1,  m(l)  :=  sup  sup  E(£„|£„_i), 

n >  1  coe{Xn-i  >N] 


□ 


m 


(/)  :=  sup supE(|£w  \  \Sn-i)>  l  >  1 

n  >  1  co 
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Theorem  15.5.2  Assume  that  EAq  <  oo  for  some  s  >  1  and  there  exist  N  >  0  and 
s  >  0  such  that 


Then 


If  moreover ; 


for  some  c\  >  0,  then 


m ( 1 )  <  —e, 
m(s)  <  c  <  oo. 

liminfM5_1(n)  <  oo. 

n—^oo 

Ms  (n  +  1)  >  Ms  (n)  —  c\ 
sup  Ms~l(n)  <  oo. 

n 


(15.5.6) 

(15.5.7) 

(15.5.8) 

(15.5.9) 

(15.5.10) 


Corollary  15.5.2  If  conditions  (15.5.6)  and  (15.5.7)  are  met  and  the  distribution 
of  Xn  converges  weakly  to  a  limiting  distribution :  ¥{Xn  <  t)  =>>  P(X  <  t),  then 
EXS_1  <  oo. 


This  assertion  follows  from  the  Fatou-Lebesgue  theorem  (see  also  Lemma  6.1.1), 
which  implies 

EXS_1  <  liminfEX*-1.  □ 

n—^oo 


The  assertion  of  Corollary  15.5.2  is  unimprovable.  One  can  see  this  from  the 

example  of  the  sequence  Xn  =  (Xn-\  +  fn)+,  where  ^  f  are  independent  and 
identically  distributed.  If  <  0  then  the  limiting  distribution  of  Xn  coincides  with 
the  distribution  of  S  =  sup/c  Sk  (see  Sect.  12.4).  From  factorisation  identities  one  can 
derive  that  ES^-1  is  finite  if  and  only  if  E(£+)5  <  oo.  An  outline  of  the  proof  is  as 
follows.  Theorem  12.3.2  implies  that  E Sk  =  c E(x^;  77+  <  00),  c  =  const  <  00.  It 
follows  from  Corollary  12.2.2  that 

poo 

1  -E(eikx+;  rj+<oo)  =  (l  -EeiK)  /  e~iXx  dH(x), 

Jo 

where  H(x)  is  the  renewal  function  for  the  random  variable  ~x°-  >  0.  Since 

a\  +  b\x  <  H(x)  <02  +  b2X 


(see  Theorem  10.1.1  and  Lemma  10.1.1;  ai ,  bi  are  constants),  integrating  the  con¬ 
volution 


P(X+  >  v,  ?7+  <  00)  = 


P(f  >  v  +  x)  dH(v) 


by  parts  we  verify  that,  as  v  — >  00,  the  left-hand  side  has  the  same  order  of  magni¬ 
tude  as  /0°°  P(f  >  v  +  x)dv.  Hence  the  required  statement  follows. 
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We  now  return  to  Theorem  15.5.2.  Note  that  in  all  of  the  most  popular  problems 
the  sequence  Ms~l(n)  behaves  “regularly”:  either  it  is  bounded  or  Ms~l(n)  oo. 
Assertion  (15.5.8)  means  that,  under  the  conditions  of  Theorem  15.5.2,  the  sec¬ 
ond  possibility  is  excluded.  Condition  (15.5.9)  ensuring  (15.5.10)  is  also  rather 
broad. 


Proof  of  Theorem  15.5.2  Let  for  simplicity’s  sake  s  >  1  be  an  integer.  We  have 

poo 

E(X*;  X„_ i  >N)=  E((x  +£„)s;  Xn.x  e  dx) 

JN 

S  /  \  n  QQ 

=  £(*)  JN  Xn^edx). 


If  we  replace  ^  ;  for  s  —  /  >  2  with  \^n  \s  1  then  the  right-hand  side  can  only  in¬ 
crease.  Therefore, 

E(X*;  X„-i  >N)<J2  (S)m(S-l)M'N(n  -  1), 

/=0  '  ' 


s — / 


where 


MlN(n)  =  E(Xln-  Xn>N). 
The  moments  Ms  (n)  =  satisfy  the  inequalities 


^*(n)<E[(JV  +  |?„|),;  X„_i 


< 


m(s  —  l)MlN(n 


<2S[NS  +  c]  + 

1=0 


-l)MlN(n 


1) 


(15.5.11) 


Suppose  now  that  (15.5.8)  does  not  hold:  Ms~l(n)  — >►  oo.  Then  all  the  more 
Ms(n)  — >►  oo  and  there  exists  a  subsequence  n'  such  that  Ms  (n')  >  Ms  (n'  —  1). 
Since  M1  (n)  <  [M/+1(rc)]////+1,  we  obtain  from  (15.5.6)  and  (15.5.11)  that 


Ms(n)  <  const  +  Ms (ri  -  \)+sMs~l{nf  -  l)m(l)  +  o(Ms~1(n/  -  l)) 

<  Ms (n  -  1)  -  -ssMs~'(n  -  l) 

2 

for  sufficiently  large  n' .  This  contradicts  the  assumption  that  Ms  ( n )  — >  oo  and  hence 
proves  (15.5.8). 

We  now  prove  (15.5.10).  If  this  relation  is  not  true  then  there  exists  a  sequence 
n'  such  that  Ms~l(n')  oo  and  Ms  (nr)  >  Ms  (nr  —  1)  —  c\ .  It  remains  to  make  use 
of  the  above  argument. 

We  leave  the  proof  for  a  non-integer  s  >  1  to  the  reader  (the  changes  are  elemen¬ 
tary).  The  theorem  is  proved.  □ 
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Remark  15.5.1  (1)  The  assertions  of  Theorems  15.5.1  and  15.5.2  will  remain  valid 
if  one  requires  inequalities  (15.5.4)  or  E(^n  +  s \3n-i)  l(Xn-i  >  N)  <  0  to  hold  not 
for  all  n,  but  only  for  n>no  for  some  no  >  1 . 

(2)  As  in  Theorem  15.5.1,  condition  (15.5.6)  means  that  the  sequence  of  random 
variables  (Xn  +  sn)  \{Xn-\  >  N)  forms  a  supermartingale. 

(3)  The  conditions  of  Theorems  15.5.1  and  15.5.2  may  be  weakened  by  replac¬ 
ing  them  with  “averaged”  conditions.  Consider,  for  instance,  condition  (15.5.1).  By 
integrating  it  over  the  set  {Xn-\  >  x  >  N]  we  obtain 

E(e^;  Xn-i  >  x)  <  p{k)P(X„-i  >  x ) 


or,  which  is  the  same, 


Xn-1>x)<P(k). 


(15.5.12) 


The  converse  assertion  that  (15.5.12)  for  all  v  >  N  implies  relation  (15.5.1)  is  obvi¬ 
ously  false,  so  that  condition  (15.5.12)  is  weaker  than  (15.5.1).  A  similar  remark  is 
true  for  condition  (15.5.4). 


One  has  the  following  generalisations  of  Theorems  15.5.1  and  15.5.2  to  the  case 
of  “averaged  conditions”. 


Theorem  15.5.1A  Let,  for  some  X  >  0,  N  >  0  and  all  x  >  N, 


Xn-1>x)<P(k)<l, 


E(ex?";  Xn-i  <N)<  \ jr{X)  <  oo. 


Then 


EeXXn  < 


Pn(X)Eekx(0)  + 


ekN  \ (f(X) 

i  -m' 


Put 


m 


m(  1)  :=  sup  sup  E(£w|Xw_i  >  x), 

n > 1 x>N 

(/)  :=  sup  sup  E(| I  l\X(n)  >  v),  l  >  1 

n> 1 x>  N 


Theorem  15.5.2A  Let  EAq  <  oo  and  there  exist  N  >  0  and  s  >  0  such  that 
m(\)  <  —£,  m(s)  <  oo,  Xn-\  <  A)  <  c  <  oo. 

Then  (15.5.8)  holds  true.  If  in  addition ,  (15.5.9)  is  valid ,  then  (15.5.10)  is  true. 

The  proofs  of  Theorems  15. 5.1  A  and  15.5.2A  are  quite  similar  to  those  of  Theo¬ 
rems  15.5.1  and  15.5.2.  The  only  additional  element  in  both  cases  is  integration  by 
parts.  We  will  illustrate  this  with  the  proof  of  Theorem  15.5.1  A.  Consider 
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poo 

E(exx";  Xn-i  >  TV)  =  /  eXxE(ex^;  X„_i  e  dx) 

JN 

poo 

=  E(ex(N+^];  X„_|  >  TV)  +  /  XeXxE(ex X„_i  >  x)dx 

poo 

<  E(ex(N+^;  X„_1  >  TV  +  /5(A))  /  A<?a*P(X„_i  >  x)dx 

poo 

=  eXNE(ex^'  -  /5(A);  X„_i  >  TV)  +  /5(A)  /  eAA:P(X„_i  e  dx) 

JN 

<  p(X)E(exx"~l ;  X„_!  >  TV). 

From  this  we  find  that 


/5n (A)  :=Eexx"  <  E(ex(x"-1+I"};  X„_i  <  TV)  +E(exx";  Xn-\  >  TV) 

<  eXNir{X)  +  p( X)E(exx’-1;  Xn-i  >  TV) 

<  eXNir{X)  -  P(X„_i  <  TV)/5(A)  +  /J(A)/J„(A ); 

eXNilf(X) 

PnW  <  jS”(a)/30(a)  +  ;;  A  □ 

I  -  /5(A) 

Note  that  Theorem  13.7.2  and  Corollary  13.7.1  on  “positive  recurrence”  can  also 
be  referred  to  as  theorems  on  boundedness  of  stochastic  sequences. 


Chapter  16 

Stationary  Sequences 


Abstract  Section  16.1  contains  the  definitions  and  a  discussion  of  the  concepts 
of  strictly  stationary  sequences  and  measure  preserving  transformations.  It  also 
presents  Poincare’s  theorem  on  the  number  of  visits  to  a  given  set  by  a  stationary  se¬ 
quence.  Section  16.2  discusses  invariance,  ergodicity,  mixing  and  weak  dependence. 
The  Birkhoff-Khintchin  ergodic  theorem  is  stated  and  proved  in  Sect.  16.3. 


16.1  Basic  Notions 

Let  (Q ,  P)  be  a  probability  space  and  £  =  (§o>  §i>  •  •  •)  an  infinite  sequence  of 
random  variables  given  on  it. 

Definition  16.1.1  A  sequence  £  is  said  to  be  strictly  stationary  if,  for  any  k ,  the 
distribution  of  the  vector  %n+k)  does  not  depend  on  n,  n  >  0. 

Along  with  the  sequence  £,  consider  the  sequence  (£„ ,  ,...).  Since  the  finite¬ 

dimensional  distributions  of  these  sequences  (i.e.  the  distributions  of  the  vectors 
(£m  , . . . ,  %m+k))  coincide,  the  distributions  of  the  sequences  will  also  coincide  (one 
has  to  make  use  of  the  measure  extension  theorem  (see  Appendix  1)  or  the  Kol¬ 
mogorov  theorem  (see  Sect.  3.5).  In  other  words,  for  a  stationary  sequence  §,  for 
any  n  and  B  e  23°°  (for  notation  see  Sect.  3.5),  one  has 

P(§  €  B)  =  P((£n,  £n+l,  •  •  •)  e  B). 

The  simplest  example  of  a  stationary  sequence  is  given  by  a  sequence  of  inde¬ 
pendent  identically  distributed  random  variables  £  =  (£q,  £i, . .  .)•  It  is  evident  that 
the  sequence  ^  =  ao £&  +  •  •  •  +  as £k+s ,  k  =  0,  1,  2, ... ,  will  also  be  stationary,  but 
the  variables  will  no  longer  be  independent.  The  same  holds  for  sequences  of  the 
form 

oo 

&  —  aj  %k+j  > 

j=  o 

provided  that  E|£y  |  <  oo,  \aj  I  <  oo,  or  if  E £&  =  0,  Var(£fc)  <  oo,  <  °o  (the 
latter  ensures  a.s.  convergence  of  the  series  of  random  variables,  see  Sect.  10.2).  In  a 
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similar  way  one  can  consider  stationary  sequences  =  g(t;k,  ,  •  •  •)  “generated” 
by  £,  where  g(x)  is  an  arbitrary  measurable  functional  R°°  i->  R. 

Another  example  is  given  by  stationary  Markov  chains.  If  {Xn}  is  a  real- valued 
Markov  chain  with  invariant  measure  Jt  and  transition  probability  P(-,  •)  then  the 
chain  {Xn}  with  X  ^jt  will  form  a  stationary  sequence,  because  the  distribution 

ju(dxo)  I  P(xo,  dx\)  •  •  • 

JBi 

will  not  depend  on  n. 

Any  stationary  sequence  §  =  (£o,  §1, . . .)  can  always  be  extended  to  a  stationary 
sequence  §  =  (. . .  §_i ,  §o,  §1 , . . .)  given  on  the  “whole  axis”. 

Indeed,  for  any  n,  —  oo  <  n  <  oo,  and  k  >  0  define  the  joint  distributions  of 
(§„,...,  %n+k)  as  those  of  (§o, . . . ,  &)•  These  distributions  will  clearly  be  consistent 
(see  Sect.  3.5)  and  by  the  Kolmogorov  theorem  there  will  exist  a  unique  probabil¬ 
ity  distribution  on  R^  =  n£-oo  ^  with  respective  a -algebra  such  that  any 
finite-dimensional  distribution  is  a  projection  of  that  distribution  on  the  correspond¬ 
ing  subspace.  It  remains  to  take  the  random  element  §  to  be  the  identity  mapping 
of  onto  itself. 

In  some  of  the  subsequent  sections  it  will  be  convenient  for  us  to  use  stationary 
sequences  given  on  the  whole  axis. 

Let  §  be  such  a  sequence.  Define  a  transformation  0  of  the  space  R^  onto  itself 
with  the  help  of  the  relations 


/ 

JBk 


P(xk-i,dxk) 


^0?  •  •  •  j  Xn-\-k  Bk)  — 


- 

Jb0 


(0x)k  =  (*)*+!  =Xk+u  (16.1.1) 

where  (x)k  is  the  k-th  component  of  the  vector  x  e  R^,  —  oo  <  k  <  oo.  The  trans¬ 
formation  6  clearly  has  the  following  properties: 

1.  It  is  a  one-to-one  mapping,  is  defined  by 

{e~lx)k  =  xk-\. 

2.  The  sequence  6%  is  also  stationary,  its  distribution  coinciding  with  that  of  § : 

P^f  e  B)  =  P(£  gB). 

It  is  natural  to  call  the  last  property  of  the  transformation  6  the  “measure  preserv¬ 
ing”  property. 

The  above  remarks  explain  to  some  extent  why  historically  exploring  the  prop¬ 
erties  of  stationary  sequences  followed  the  route  of  studying  measure  preserving 
transforms.  Studies  in  that  area  constitute  a  substantial  part  of  the  modern  analysis. 
In  what  follows,  we  will  relate  the  construction  of  stationary  sequences  to  measure 
preserving  transformations,  and  it  will  be  more  convenient  to  regard  the  latter  as 
“primary”  objects. 

Definition  16.1.2  Let  (Q,  P)  be  the  basic  probability  space.  A  transformation  T 
of  Q  into  itself  is  said  to  be  measure  preserving  if: 
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(1)  T  is  measurable,  i.e.  T  1 A  =  {co  :  Too  e  A}  g  #  for  any  Ae  J;  and 

(2)  T  preserves  the  measure:  P(T_1  A)  =  P(A)  for  any  A  g  Sr. 

Let  T  be  a  measure  preserving  transformation,  Tn  its  n- th  iteration  and  §  =  §  (co) 
be  a  random  variable.  Put  U §  (&>)  =  %(Tco),  so  that  £/  is  a  transformation  of  random 
variables ,  and  Uk^(co)  =  %(Tkoo).  Then 

£  =  {t/”f(«)}~  =  {f(rV)}~  (16.1.2) 

A  a  stationary  sequence  of  random  variables. 

Proof  Indeed,  let  A  =  {co\  §  g  B},  B  g  93°°  and  A\  =  {co  :  61;  e  B}.  We  have 

£  =  (£  (ft)),  $(ra>), . . .),  0$  =  (^Tco),^(T2co), 

Therefore  co  e  A\  if  and  only  if  Too  g  A,  i.e.  when  A\  =  T~{  A.  But  P(T_1A)  = 
P(A)  and  hence  P(Ai)  =  P(A)  ,  so  that  P(AW)  =  P(A)  for  any  n  >  1  as  well,  where 
An  =  {co  :  6n%  e  B}.  □ 

Stationary  sequences  defined  by  (16.1.2)  will  be  referred  to  as  sequences  gener¬ 
ated  by  the  transformation  T . 

To  be  able  to  construct  stationary  sequences  on  the  whole  axis,  we  will  need  mea¬ 
sure  preserving  transformations  acting  both  in  “positive”  and  “negative”  directions. 

Definition  16.1.3  A  transformation  T  is  said  to  be  bidirectional  measure  preserving 
if: 

(1)  T  is  a  one-to-one  transformation,  the  domain  and  range  of  T  coincide  with  the 
whole  Q ; 

(2)  the  transformations  T  and  T_1  are  measurable,  i.e. 

T~l  A  =  {co  :  T co  e  A}  g  T A  =  {Too  :  co  G  A}  g  $ 

for  any  A  e 

(3)  the  transformation  T  preserves  the  measure:  P(T_1A)  =  P(A),  and  therefore 
P(A)  =  P(T A)  for  any  A  e$. 

For  such  transformations  we  can,  as  before,  construct  stationary  sequences  § 
defined  on  the  whole  axis: 

The  argument  before  Definition  16.1.2  shows  that  this  approach  “exhausts”  all 
stationary  sequences  given  on  (T2,  P),  i.e.  to  any  stationary  sequence  §  we  can 

relate  a  measure  preserving  transformation  T  and  a  random  variable  §  =  §o  such 
that  (Tkco).  In  this  construction,  we  consider  the  “sample  probability 

space”  (M°°,  93°°,  P)  for  which  §  (co)  =  co,0  =  T .  The  transformation  6  =  T  (that  is, 
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transformation  (16.1.1))  will  be  called  the  pathwise  shift  transformation.  It  always 
exists  and  “generates”  any  stationary  sequence. 

Now  we  will  give  some  simpler  examples  of  (bidirectional)  measure  preserving 
transformations. 

Example  16.1.1  Let  £2  =  { co\ , . . . ,  cod },  d  >  2,  be  a  finite  set,  $  be  the  a-algebra  of 
all  its  subsets,  Tcot  =  G>i+ 1,  1  <  i  <  d  —  1  and  T cod  =  oo\.  If  P(&>/)  =  l/d  then  T 
and  T~l  are  measure  preserving  transformations. 

Example  16.1.2  Let  £2  =  [0,  1),  $  be  the  tr-algebra  of  Borel  sets,  P  the  Lebesgue 
measure  and  s  a  fixed  number.  Then  Tco  =  co  +  s  (mod  1)  is  a  bidirectional  measure 
preserving  transformation. 

In  these  examples,  the  spaces  £2  are  rather  small,  which  allows  one  to  construct 
on  them  only  stationary  sequences  with  deterministic  or  almost  deterministic  de¬ 
pendence  between  their  elements.  If  we  choose  in  Example  16.1.1  the  variable  §  so 
that  all  §( coi )  are  different,  then  the  value  §£(&>)  =  %(Tkoo)  will  uniquely  determine 
Tkco  and  thereby  TkJrlco  and  %k+\ (&>)•  The  same  can  be  said  of  Example  16.1.2  in 
the  case  when  §  ( co ),  co  e  [0,  1),  is  a  monotone  function  of  co. 

As  our  argument  at  the  beginning  of  the  section  shows,  the  space  £2  —  M°°  is 
large  enough  to  construct  on  it  any  stationary  sequence. 

Thus,  we  see  that  the  concept  of  a  measure  preserving  transformation  arises  in 
a  natural  way  when  studying  stationary  processes.  But  not  only  in  that  case.  It  also 
arises,  for  instance,  while  studying  the  dynamics  of  some  physical  systems.  Indeed, 
the  whole  above  argument  remains  valid  if  we  consider  on  {£2,  #)  an  arbitrary  mea¬ 
sure  fi  instead  of  the  probability  P.  For  example,  for  £2  —  M°°,  the  value  fi(A), 
A  e  could  be  the  Lebesgue  measure  (volume)  of  the  set  A.  The  measure  preserv¬ 
ing  property  of  the  transformation  T  will  mean  that  any  set  A,  after  the  transform  T 
has  acted  on  it  (which,  say,  corresponds  to  the  change  of  the  physical  system’s  state 
in  one  unit  of  time),  will  retain  its  volume.  This  property  is  rather  natural  for  incom¬ 
pressible  liquids.  Many  laws  to  be  established  below  will  be  equally  applicable  to 
such  physical  systems. 

Returning  to  probabilistic  models,  i.e.  to  the  case  when  the  measure  is  a  proba¬ 
bility  distribution,  it  turns  out  that,  in  that  case,  for  any  set  A  with  P(A)  >  0,  the 
“trajectory”  Tnco  will  visit  A  infinitely  often  for  almost  all  (with  respect  to  the  mea¬ 
sure  P)  CO  G  A. 

Theorem  16.1.1  (Poincare)  Let  T  be  a  measure  preserving  transformation  and 
A  ej.  Then,  for  almost  all  co  e  A,  the  relation  Tnco  e  A  holds  for  infinitely  many 
n>  1. 

Proof  Put  N  :=  {co  e  A  :  Tnco  £  A  for  all  n  >  1}.  Because  {co  :  Tnco  e  A}  it  is 
not  hard  to  see  that  N  e  Clearly,  N  D  T~n  N  =  0  for  any  n  >  1,  and  T~mN  Pi 
T~dn+n)N  —  T~m(N  n  T~nN)  =  0.  This  means  that  we  have  infinitely  many  sets 
T~nN,  n  =  0,  1,2,...,  which  are  disjoint  and  have  one  and  the  same  probability. 
This  evidently  implies  that  P(A0  =  0. 
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Thus,  for  each  go  e  A  \  N,  there  exists  anni  =  n\(co)  such  that  Tnico  e  A.  Now 
we  apply  this  assertion  to  the  measure  preserving  mapping  Tk  =  Tk ,  k  >  1 .  Then,  for 
each  co  e  A  \  Nk,  P(A^)  =  0,  there  exists  an  n k  =  nk(co)  >  1  such  that  ( Tk)nkoo  e  A. 
Since  krik  >k,  the  theorem  is  proved.  □ 

Corollary  16.1.1  Let  §  (co)  >  0  and  A  =  {go :  §  (co)  >  0}.  Then,  for  almost  all  go  e  A, 

oo 

y]f(rV)  =  oo. 

n= 0 


Proof  Put  Ak  =  {go  :  %(co)  >  1/ k}  C  A.  Then  by  Theorem  16.1.1  the  above  series 
diverges  for  almost  all  co  e  Ak.lt  remains  to  notice  that  A  =  (Jk  Ak.  □ 

Remark  16.1.1  Formally,  one  does  not  need  condition  P(A)  >  0  in  Theorem  16.1.1 
and  Corollary  16.1.1.  However,  in  the  absence  of  that  condition,  the  assertions  may 
become  meaningless,  since  the  set  A  \  N  in  the  proof  of  Theorem  16.1.1  can  turn  out 
to  be  empty.  Suppose,  for  example,  that  in  the  conditions  of  Example  16.1.2,  A  is  a 
one-point  set:  A  =  {go},  go  e  [0,  1).  If  s  is  irrational,  then  Tkco  will  never  be  in  A  for 
k  >  1 .  Indeed,  if  we  assume  the  contrary,  then  we  will  infer  that  there  exist  integers 
k  and  m  such  that  go  -\-  sk  —  m  =  co,  s  =m/k,  which  contradicts  the  irrationality 
of  s. 


16.2  Ergodicity  (Metric  Transitivity),  Mixing  and  Weak 
Dependence 

Definition  16.2.1  A  set  A  e  $  is  said  to  be  invariant  (with  respect  to  a  measure 
preserving  transformation  T)  if  T~l  A  =  A.  A  set  A  e  $  is  said  to  be  almost  in¬ 
variant  if  the  sets  T~l  A  and  A  differ  from  each  other  by  a  set  of  probability  zero: 
P(A  0  T~lA)  =  0,  where  A  B  =  AB  U  AB  is  the  symmetric  difference. 

It  is  evident  that  the  class  of  all  invariant  (almost  invariant)  sets  forms  a  o  -algebra 
which  will  be  denoted  by  3  (2*). 

Lemma  16.2.1  If  A  is  an  almost  invariant  set  then  there  exists  an  invariant  set  B 
such  that  P(A  0  B)  =  0. 

Proof  Put  B  =  limsup^^  T~n  A  (recall  that  limsup^^  An  =  H^i  Uj£/z  ^ k  is 
the  set  of  all  points  which  belong  to  infinitely  many  sets  Ak).  Then 

T~lB  =  limsupr_(n+1)A  =  B, 

n^oo 
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i.e.  B  e  3.  It  is  not  hard  to  see  that 


oo 

A  ©  B  C  (J (T~kA  0  T~(k+l)A). 
k=0 


Since 

P (T~kA  0  T~(k+l)A)  =  P (A  0  T~x  A)  =  0, 
we  have  P(A  0  B)  =  0.  The  lemma  is  proved.  □ 

Definition  16.2.2  A  measure  preserving  transformation  T  is  said  to  be  ergodic  (or 
metric  transitive)  if  each  invariant  set  has  probability  zero  or  one. 

A  stationary  sequence  {§&}  associated  with  such  T  (i.e.  the  sequence  which  gen¬ 
erated  T  or  was  generated  by  T)  is  also  said  to  be  ergodic  (metric  transitive). 

Lemma  16.2.2  A  transformation  T  is  ergodic  if  and  only  if  each  almost  invariant 
set  has  probability  0  or  1 . 

Proof  Let  T  be  ergodic  and  AeT.  Then  by  Lemma  16.2.1  there  exists  an  invariant 
set  B  such  that  P(A  0  B)  =  0.  Because  P(£)  =  0  or  1,  the  probability  P(A)  =  0  or  1. 
The  converse  assertion  is  obvious.  □ 

Definition  16.2.3  A  random  variable  f  =  £(co)  is  said  to  be  invariant  (< almost  in¬ 
variant)  if  £(co)  =  t;(Tco)  for  all  co  e  Q  (for  almost  all  co  e  L2). 

Theorem  16.2.1  Let  T  be  a  measure  preserving  transformation.  The  following 
three  conditions  are  equivalent : 

(1)  T  is  ergodic ; 

(2)  each  almost  invariant  random  variable  is  a.s.  constant ; 

(3)  each  invariant  random  variable  is  a.s.  constant. 

Proof  (1)  =>  (2).  Assume  that  T  is  ergodic  and  §  is  almost  invariant,  i.e.  §(<w)  = 
§( T co )  a.s.  Then,  for  any  ueI,  we  have  Av  :=  {co  :  §( co )  <  v}  e  T*  and,  by 
Lemma  16.2.2,  P(AV)  equals  0  or  1.  Put  V  :=  sup{n  :  P(AU)  =  0}.  Since  Av  f  Q  as 
v  t  oo  and  Av  |  0  as  v  |  — oo,  one  has  |  V \  <  oo  and 


Similarly,  P(§(&>)  >  V)  =  0.  Therefore  P(§(<z>)  =  V)  =  1. 

(2)  =>►  (3).  Obvious. 

(3)  (1).  Let  A  e  1  Then  the  indicator  function  l a  is  an  invariant  random 

variable,  and  since  it  is  constant,  one  has  either  1^  =  0  or  l a  =  1  a.s.  This  implies 
that  P(A)  equals  0  or  1.  The  theorem  is  proved.  □ 


P(§(®)  <  V)  = 


n— 1 


%(co)  <  V  — 
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The  assertion  of  the  theorem  clearly  remains  valid  if  one  considers  in  (3)  only 
bounded  random  variables.  Moreover,  if  §  is  invariant,  then  the  truncated  variable 
%(N)  =  min(§,  N)  is  also  invariant. 

Returning  to  Examples  16.1.1  and  16.1.2,  in  Example  16.1.1, 

£2  =  (O0\  ,  .  .  .  ,  Cl) cl )  ?  T (Oj  =  (Oi- 1_ i  (mod  d )  ?  P )  —  1/d. 

The  transformation  T  is  obviously  metric  transitive. 

In  Example  16. E2,  Q  =  [0,  1),  Too  =  oo  +  s  (mod  1),  and  P  is  the  Lebesgue 
measure.  We  will  now  show  that  T  is  ergodic  if  and  only  if  s  is  irrational. 

Consider  a  square  integrable  random  variable  §  =  §( oo )  :  E %2(oo)  <  oo.  Then  by 
the  Parseval  equality,  the  Fourier  series 


S(a>)  =  J2ane2ninm 

/?=() 


for  this  function  has  the  property  Y^tLo  I f  «  I  <  Assume  that  .v  is  irrational,  while 
£  is  invariant.  Then 

an  =  E ^(co)e~2ninco  =  E^(Tco)e~27zinTa) 

=  e~2ninsE^(T  a))e~2nina>  =  e~27linsEi;(co)e-2jzina}  =  e~27tinsan. 

For  irrational  s9  this  equality  is  only  possible  when  an  =  0,  n  >  1,  and  §  (oo)  =ao  = 
const.  By  Theorem  16.2.1  this  means  that  T  is  ergodic. 

Now  let  s  —  m / n  be  rational  (m  and  n  are  integers).  Then  the  set 


n  —  1 


^  =  U 


k=0 


2k  2k  +  1 

oo  :  —  <  oo  < 


2  n 


2  n 


will  be  invariant  and  P(A)  =  1  /2.  This  means  that  T  is  not  ergodic.  □ 


Definition  16.2.4  A  measure  preserving  transformation  T  is  called  mixing  if,  for 
any  A\,  A  2  e  as  n  — >  00, 

P(Ai  n  T~nA2)  P(Ai)P(A2).  (16.2.1) 

Now  consider  the  stationary  sequence  §  =  ($0,  $1,  •  •  •)  generated  by  the  transfor¬ 
mation  T  :  ^k(oo)  =  §0 (Tkoo). 


Definition  16.2.5  A  stationary  sequence  §  is  said  to  be  weakly  dependent  if  ^  and 
%k+n  are  asymptotically  independent  as  n  — >►  00,  i.e.  for  any  B\,  B2  e  2B> 

P(&  €  Bi^k+n  €  B2)  ->  P(£0  €  5i)P(§0  €  B2).  (16.2.2) 


Theorem  16.2.2  A  measure  preserving  transformation  T  is  mixing  if  and  only  if 
any  stationary  sequence  §  generated  by  T  is  weakly  dependent. 
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Proof  Let  T  be  mixing.  Put  A;  :=  §0  i  =  1,2,  and  set  k  =  0  in  (16.2.2).  Then 
P($o  e  fii,  e  B2)  =  P(Ai  n  T~n  A2)  -*  P(Ai)P(A2). 

T  is  weakly  dependent.  For  any  given 

£  Ai  U  A2; 

€  AiA2; 

€  AiA2; 
e  AiA2; 

P(A!  n  T~n  A2)  =  P(0  <  fo  <  3,  >  2)  -*  P(0  <  <  3)P(f0  >  2) 

=  P(Ai)P(A2). 


Now  assume  any  sequence  generated  by 
A\ ,  A  2  6  #,  define  the  random  variable 


£(*>)  = 


0 

1 

2 

3 


if  co 
if  co 
if  co 
if  co 


and  put  :=  %(Tkco).  Then,  as  n  — >►  oo. 


The  theorem  is  proved. 


□ 


Let  {Xn}  be  a  stationary  real- valued  Markov  chain  with  an  invariant  distribution 
7t  that  satisfies  the  conditions  of  the  ergodic  theorem,  i.e.  such  that,  for  any  B  3 

and  v  e  R,  as  n  — >►  oo, 


P(XneB\X0  =  x)^7t(B). 

Then  {Xn}  is  weakly  dependent,  and  therefore,  by  Theorem  16.2.2,  the  respective 
transformation  T  is  mixing.  Indeed, 

P(X0  eBuXne  B2)  =  EI(X0  €  Bi)P(Xn  €  B2  \  X0), 

where  the  last  factor  converges  to  jt(B2)  for  each  Xo.  Therefore  the  above  proba¬ 
bility  tends  to  7i(B2)ti{B\). 

Further  characterisations  of  the  mixing  property  will  be  given  in  Theorems  16.2.4 
and  16.2.5. 

Now  we  will  introduce  some  notions  that  are  somewhat  broader  than  those  from 
Definitions  16.2.4  and  16.2.5. 

Definition  16.2.6  A  transformation  T  is  called  mixing  on  the  average  if,  as  n  — >  oo, 

1  n 

-  Y  P(Ai  n  T~k A2)  -*  P(Ai)P(A2).  (16.2.3) 

n  L — 4 
k= 1 

A  stationary  sequence  §  is  said  to  be  weakly  dependent  on  the  average  if 

1  n 

-  VP&  €  BU  i=k  €  B2)  ->  P($o  G  B i)P(§0  €  B2). 

n  f  * 


(16.2.4) 


16.2  Ergodicity  (Metric  Transitivity),  Mixing  and  Weak  Dependence 
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Theorem  16.2.3  A  measure  preserving  transformation  T  is  mixing  on  the  average 
if  and  only  if  any  stationary  sequence  §  generated  by  T  is  weakly  dependent  on  the 
average. 

The  Proof  is  the  same  as  for  Theorem  16.2.2,  and  is  left  to  the  reader.  □ 


If  {Xn}  is  a  periodic  real-valued  Markov  chain  with  period  d  such  that  each 
of  the  embedded  sub-chains  {Xi+nd  }^0,  i  =  0 , ...  ,d  —  1,  satisfies  the  ergodicity 
conditions  with  invariant  distributions  n^l)  on  disjoint  sets  To, ... ,  A^-i,  then  the 
“common”  invariant  distribution  n  will  be  equal  to  d~l  YlfZo  71  and  the  chain 
{Xn}  will  be  weakly  dependent  on  the  average.  At  the  same  time,  it  will  clearly  not 
be  weakly  dependent  for  d  >  1 . 

Theorem  16.2.4  A  measure  preserving  transformation  T  is  ergodic  if  and  only  if  it 
is  mixing  on  the  average. 

Proof  Let  T  be  mixing  on  the  average,  and  A  i  e  A2  e  3.  Then  A 2  =  T~k  A2 
and  hence  P(Ai  n  T~k A2)  =  P(Ai Af)  for  all  k  >  1.  Therefore,  (16.2.3)  means  that 
P(AiA2)  =  P(Ai)P(A2).  For  A\  =  A2  we  get  P(A2)  =  P2(A2),  and  consequently 
P(A2)  equals  0  or  1. 

We  postpone  the  proof  of  the  converse  assertion  until  the  next  section.  □ 

Now  we  will  give  one  more  important  property  of  ergodic  transforms. 


Theorem  16.2.5  A  measure  preserving  transformation  T  is  ergodic  if  and  only  if 
for  any  AgJ  with  P(A)  >  0,  one  has 


P 


UT~n 

/?=() 


(16.2.5) 


Note  that  property  (16.2.5)  means  that  the  sets  T  11  A,  n  =  0,  1, . . . ,  “exhaust” 
the  whole  space  <£2,  which  associates  well  with  the  term  “mixing”. 

Proof  Let  T  be  ergodic.  Put  B  :=  U/^Lo  F~"A.  Then  T~lB  C  B.  Because  T  is 
measure  preserving,  one  also  has  that  P B)  =  P (B).  From  this  it  follows  that 
T~lB  =  Bup  to  a  set  of  measure  0  and  therefore  B  is  almost  invariant.  Since  T  is 
ergodic,  P(5)  equals  0  or  1.  But  P(Z?)  >  P(A)  >  0,  and  hence  P(5)  =  1. 

Conversely,  if  T  is  not  ergodic,  then  there  exists  an  invariant  set  A  such  that 
0  <  P(A)  <  1  and,  therefore,  for  this  set  T  n  A  —  A  holds  and 

P(5)  =  P(A)  <  1. 


The  theorem  is  proved. 


□ 
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Remark  16.2.1  In  Sects.  16.1  and  16.2  we  tacitly  or  explicitly  assumed  (mainly  for 
the  sake  of  simplicity  of  the  exposition)  that  the  components  of  the  stationary 
sequence  §  are  real.  However,  we  never  actually  used  this,  and  so  we  could,  as 
we  did  while  studying  Markov  chains,  assume  that  the  state  space  A',  in  which 
&  take  their  values,  is  an  arbitrary  measurable  space.  In  the  next  section  we  will 
substantially  use  the  fact  that  ^  are  real-  or  vector- valued. 


16.3  The  Ergodic  Theorem 


For  a  sequence  §i,  §2, . . .  of  independent  identically  distributed  random  variables 
we  proved  in  Chap.  1 1  the  strong  law  of  large  numbers: 


n 


a.s. 


E|i, 


where  Sn 


n  —  1 


!>• 


Now  we  will  prove  the  same  assertion  under  much  broader  assumptions — for  sta¬ 
tionary  ergodic  sequences,  i.e.  for  sequences  that  are  weakly  dependent  on  the  aver¬ 
age. 

Let  {§ k }  be  an  arbitrary  strictly  stationary  sequence,  T  be  the  associated  measure 
preserving  transformation,  and  3  be  the  o -algebra  of  invariant  sets. 


Theorem  16.3.1  (Birkhoff-Khintchin)  7/*E|§ol  <  00  then 


Edo  |3). 

n  k= 0 


(16.3.1) 


If  the  sequence  {^}  (or  transformation  T)  is  ergodic ,  then 


*  n  —  1 

n  L — ' 

k=0 


a.s. 


E§o. 


(16.3.2) 


Below  we  will  be  using  the  representation  =  § (Tkco)  for  §  =  §o-  We  will  need 
the  following  auxiliary  result. 


Lemma  16.3.1  Set 


n  —  1 

Mk(oo)  :=  max{0,  S\(co), . . . ,  5^  (&>)}. 

k= 0 

Then,  under  the  conditions  of  Theorem  16.3.1, 

E[§(<w)I{mw>0}  (<*>)]  >  0 


for  any  n  >  1 . 
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Proof  For  all  k  <  n,  one  has  Sk(Tco)  <  Mn(T co ),  and  hence 

§0)  +  Mn(Toj)  >^(co)  +  SkiTco)  =  Sk+\(co). 
Because  §  (co)  >  Si  (&>)  —  M/?(T &>),  we  have 

§(<A)  >  max{max(Si(<w), . . . ,  —  Mn(T co). 

Further,  since 


{Mn(co)  >  0}  =  {max(Si(o;), . . . ,  S„(»)  >  0}, 

we  obtain  that 

E[£MI(m„>o}M]  >  E(max(5'i (a>), Sn(cu))  -  Mn(ToS))  I(m„>0}(®) 

>  E (Mn(co)  -  M„(7’®))I(m,!>0)(<w) 
>E(Mn(co)-Mn(Tco))=0. 


The  lemma  is  proved.  □ 

Proof  of  Theorem  16.3.1  Assertion  (16.3.2)  is  an  evident  consequence  of  (16.3.1), 
because,  for  ergodic  T ,  the  a -algebra  3  is  trivial  and  E(§|3)  =  E§  a.s.  Hence,  it 
suffices  to  prove  (16.3.1). 

Without  loss  of  generality,  we  can  assume  that  E(§|3)  =  0,  for  one  can  always 
consider  §  —  E(§  \3)  instead  of  §. 

Let  S  :=  lim sup^^  n~lSn  and  S_  :=  liming ^oo  Sn .  To  prove  the  theorem, 

it  suffices  to  establish  that 

0<S<S<0  a.s.  (16.3.3) 

Since  S(co)  =  S(Tco),  the  random  variable  S  is  invariant  and  hence  the  set  A  —  e  = 
{ S(co)  >  s}  is  also  invariant  for  any  e  >  0.  Introduce  the  variables 

rM  :=  (SCaO-e)1^). 

S*k(co)  + 

Mk(co)  :=  max(0,  S*, . . . ,  Sk). 

Then,  by  Lemma  16.3.1,  for  any  n  >  1 ,  one  has 

E§*I{m*>0}  >  0. 

But,  as  n  — >  oo, 

{■ M, n  >  °}  =  {  ,max  Sk  >  o}  t  {  sup  V  >  o] 
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c»* 

Sk  1 

sup  —  >0 

►  —  < 

SUp -  >  £ 

[k>  1  k 

k>\  k 

FI  Ac  —  A< 


The  last  equality  follows  from  the  observation  that 


As  =  {S>£}c\ 


Sk 

sup  —  >  £ 
k>  l  k 


Further,  E|§*|  <  E|§|  +  s.  Hence,  by  the  dominated  convergence  theorem, 

o<Eri{M*>o}^EriA£. 

Consequently, 

0  <  Er IAe  =  E($  -  e) lAs  =  E^IAs  -  sP(A£) 

=  EIA£E(§  |  3)  -  sP(As)  =  -sP(A£). 

This  implies  that  P(Ae)  =  0  for  any  s  >  0,  and  therefore  P (S  <  0)  =  1. 

In  a  similar  way,  considering  the  variables  — §  instead  of  §,  we  obtain  that 


limsupl  — 

n^o o  V  W 


—  lim  inf  —  =  — S , 

n^oo  n 


and  P(— 5  <  0)  =  1,  P(S  >  0)  =  1.  The  required  inequalities  (16.3.3),  and  therefore 
the  theorem  itself,  are  proved.  □ 


Now  we  can  complete  the 


Proof  of  Theorem  16.2.4  It  remains  to  show  that  the  ergodicity  of  T  implies  mixing 
on  the  average.  Indeed,  let  T  be  ergodic  and  A\,  A2  e  $.  Then,  by  Theorem  16.3.1, 
we  have 


=  -  ]3l(T-kA2)  ^4  P(A2),  I(Ai)f„  -^4  I(A!)P(A2). 

H  k=  1 

Since  f„I(Ai)  are  bounded,  one  also  has  the  convergence 

Ef«I(Ai)  — >  P(A2)  •  P(Ai). 


Therefore 

1  n 

-  y]P(Ai  n  T~kA2)  =  EI(Ai)?„  ->  P(Ai)P(A2). 

n  L — 4 

k= 1 

The  theorem  is  proved. 


□ 


Now  we  will  show  that  convergence  in  mean  also  holds  in  (16.3.1)  and  (16.3.2). 
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Theorem  16.3.2  Under  the  assumptions  of  Theorem  16.3.1,  one  has  along  with 
(16.3.1)  and  (16.3.2)  that ,  respectively , 


and 


as  n 


oo, 


E 


I  n  —  1 

-  V&-E($b|3) 

/I  Z - ' 

k= 0 


E 


^  n  —  1 

-  y>-Ef0 

k= o 


0 


(16.3.4) 


(16.3.5) 


Proof  The  assertion  of  the  theorem  follows  in  an  obvious  way  from  Theo¬ 
rems  16.3.1,  6.1.7  and  the  uniform  integrability  of  the  sums 


1 

n 


n  —  1 


!>• 


which  follows  from  Theorem  6.1.6. 


□ 


Corollary  16.3.1  If{^k)  Is  a  stationary  metric  transitive  sequence  and  a  =  E§£  <  0, 
then  S(oo)  =  sup^>0  Sk(oo)  is  a  proper  random  variable. 


The  proof  is  obvious  since,  for  0  <  s  <  —a,  one  has  Sk  <  (a  +  s)k  <  0  for  all 
k  >  n(co)  <  oo.  □ 

An  unusual  feature  of  Theorem  16.3.1  when  compared  with  the  strong  law  of 
large  numbers  from  Chap.  1 1  is  that  the  limit  of 


1 

n 


n  —  1 


I> 


can  be  a  random  variable.  For  instance,  let  Took  :=  (0k+ 2  and  d  =  2/  be  even  in  the 
situation  of  Example  16.1.1.  Then  the  transformation  T  will  not  be  ergodic,  since 
the  set  A  =  {co\ ,003, ,  cod- 1}  will  be  invariant,  while  P(A)  =  1  /2. 

On  the  other  hand,  it  is  evident  that,  for  any  function  §  ( 00 ),  the  sum 


1 

n 


n  —  1 


will  converge  with  probability  1  /  2  to 


2 

d 


1- 1 
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(if  oo  =  00 i  and  i  is  odd)  and  with  probability  1  /2  to 

j= i 

(if  =  coi  and  i  is  even).  This  limiting  distribution  is  just  the  distribution  of  E(§  |3) . 


Chapter  17 

Stochastic  Recursive  Sequences 


Abstract  The  chapter  begins  with  introducing  the  concept  of  stochastic  random  se¬ 
quences  in  Sect.  17.1.  The  idea  of  renovating  events  together  with  the  key  results 
on  ergodicity  of  stochastic  random  sequences  and  the  boundedness  thereof  is  pre¬ 
sented  in  Sect.  17.2,  whereas  the  Loynes  ergodic  theorem  for  the  case  of  monotone 
functions  specifying  the  recursion  is  proved  in  Sect.  17.3.  Section  17.4  establishes 
ergodicity  conditions  for  contracting  in  mean  Lipschitz  transformations. 


17.1  Basic  Concepts 

Consider  two  measurable  state  spaces  (X,  ©x)  and  (y,©y),  and  let  {%n}  be  a 
sequence  of  random  elements  taking  values  in  y.  If  (f2,$,P)  is  the  underlying 
probability  space,  then  {co  :  e  B)  e  $  for  any  B  e  *By.  Assume,  moreover, 
that  a  measurable  function  /  :  A  x  y  — >  A  is  given  on  the  measurable  space 
(A  x  y,  ©x  x  33  y),  where  ©x  x  ©  y  denotes  the  a  -algebra  generated  by  sets 
A  x  B  with  A  e  ©x  and  B  e  ©yx 

For  simplicity’s  sake,  by  A  and  y  we  can  understand  the  real  line  R,  and  by  ©x, 
©^  the  <T-algebras  of  Borel  sets. 

Definition  17.1.1  A  sequence  {Xn},  n  =  0, 1, ... ,  taking  values  in  (X,  ©x)  is  said 
to  be  a  stochastic  recursive  sequence  (s.r.s.)  driven  by  the  sequence  {%n}  if  Xn  satis¬ 
fies  the  relation 


Xn+i  =  f{Xn^n)  (17.1.1) 

for  all  n  >0.  For  simplicity’s  sake  we  will  assume  that  the  initial  state  Xo  is  inde¬ 
pendent  of  {§ n }. 

The  distribution  of  the  sequence  {Xn,  §„}  on  ((A  x  X)°°,  (Bx  x  By)00)  can  be 
constructed  in  an  obvious  way  from  finite-dimensional  distributions  similarly  to  the 
manner  in  which  we  constructed  on  ( A°° ,  ©^ )  the  distribution  of  a  Markov  chain  X 
with  values  in  (X,  ©x)  from  its  transition  function  P(x,  B)  =  P(Xi(v)  e  B).  The 
finite-dimensional  distributions  of  {(Xo,  §o)>  •  •  •  > (Xfc,  ^)}  for  the  s.r.s.  are  given  by 
the  relations 
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P (Xi  G  A/,  §/  G  By,  l  —  0, . . . ,  k) 

k 

=  f  ■■■[  P(£/  e  dyu  /  =  0,  l(//(X0,  yo,---,  yi)  e  X), 

Jb0  JBk  /=1 

where  f\  (x ,  y0)  :=  /(*,  yo),  //(*,  yo,  ■  ■  ■ ,  yi)  '■=  f(fi- lO,  yo,  •  •  • ,  y/-i),  y/)- 

Without  loss  of  generality,  the  sequence  {%n }  can  be  assumed  to  be  given  for  all 
— oo  <  n  <  oo  (as  we  noted  in  Sect.  16.1,  for  a  stationary  sequence,  the  required 
extension  to  n  <  0  can  always  be  achieved  with  the  help  of  Kolmogorov’s  theorem). 

A  stochastic  recursive  sequence  is  a  more  general  object  than  a  Markov  chain.  It 
is  evident  that  if  ^  are  independent,  then  the  Xn  form  a  Markov  chain.  A  stronger 
assertion  is  true  as  well:  under  broad  assumptions  about  the  space  (A,  93#),  f°r  any 
Markov  chain  {Xn}  in  (X,  ®x)  one  can  construct  a  function  /  and  a  sequence  of 
independent  identically  distributed  random  variables  {§ n }  such  that  (17.1.1)  holds. 
We  will  elucidate  this  statement  in  the  simplest  case  when  both  A  and  y  coincide 
with  the  real  line  R.  Let  P(x,  B),  B  e  93,  be  the  transition  function  of  the  chain 
{Xn},  and  Fx(jt)  =  P(x,  (— oo,0)  the  distribution  function  of  X\(x)  (Xo  =  x).  Then 
if  F~l(t)  is  the  function  inverse  (in  t )  to  Fx(t)  and  a  ^  Uo,i  is  a  random  variable, 
then,  as  we  saw  before  (see  e.g.  Sect.  6.2),  the  random  variable  Fyl(a)  will  have  the 
distribution  function  Fx(t).  Therefore,  if  {an}  is  a  sequence  of  independent  random 
variables  uniformly  distributed  over  [0,  1],  then  the  sequence  Xn+\  =  Fjjl(an)  will 
have  the  same  distribution  as  the  original  chain  {Xn}.  Thus  the  Markov  chain  is  an 
s.r.s.  with  the  function  f(x,  y )  =  F~l(y)  and  driving  sequence  {an},  an  €=  Uo,i. 

For  more  general  state  spaces  A,  a  similar  construction  is  possible  if  the  o- 
algebra  %$x  is  countably-generated  (i.e.  is  generated  by  a  countable  collection  of 
sets  from  X).  This  is  always  the  case  for  Borel  a -algebras  in  A  =  R^,  d  >  1  (see 
[22]). 

One  can  always  consider  /(•,§„)  as  a  sequence  of  random  mappings  of  the  space 
A  into  itself.  The  principal  problem  we  will  be  interested  in  is  again  (as  in  Chap.  13) 
that  of  the  existence  of  the  limiting  distribution  of  Xn  as  n  — >  oo. 

In  the  following  sections  we  will  consider  three  basic  approaches  to  this  problem. 


17.2  Ergodicity  and  Renovating  Events.  Boundedness 
Conditions 

17.2.1  Ergodicity  of  Stochastic  Recursive  Sequences 

We  introduce  the  a -algebras 

:=cr{%kmJ  <k<n}, 

di  ■=  <?{&;£<«}  =  sioo, «, 
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&  :=  crfe  -oo  <  k  <  oo}  = 

In  the  sequel,  for  the  sake  of  definiteness  and  simplicity,  we  will  assume  the  initial 
value  Xo  to  be  constant  unless  otherwise  stated. 

Definition  17.2.1  An  event  A  g  3|+m,  m  >  0,  is  said  to  be  renovating  for  the  s.r.s. 
{Xn}  on  the  segment  [n,  n  +m]  if  there  exists  a  measurable  function  g  :  ym+l  — >  A 
such  that,  on  the  set  A  (i.e.  for  co  G  A), 

Xn-\-m-\- 1  ~  >  ^n-\-m)  •  (17.2.1) 

It  is  evident  that,  for  co  g  A,  relations  of  the  form  Xn+m+k+\  =  gki^n,  •  •  • ,  %n+m+k) 
will  hold  for  all  k  >  0,  where  gk  is  a  function  depending  on  its  arguments  only  and 
determined  by  the  event  A. 

The  sequence  of  events  {Aw},  An  g  Sf+m,  where  the  integer  m  is  fixed,  is  said 
to  be  renovating  for  the  s.r.s.  {Xn}  if  there  exists  an  integer  no  >  0  such  that,  for 
n  >  no,  one  has  relation  (17.2.1)  for  co  G  An,  the  function  g  being  common  for  all  n. 

We  will  be  mainly  interested  in  “positive”  renovating  events,  i.e.  renovating 
events  having  positive  probabilities  P (An)  >  0. 

The  simplest  example  of  a  renovating  event  is  the  hitting  by  the  sequence  Xn  of 
a  fixed  point  vo  •  An  =  {Xn  =  xo)  (here  m  =  0),  although  such  an  event  could  be  of 
zero  probability.  Below  we  will  consider  a  more  interesting  example. 

The  motivation  behind  the  introduction  of  renovating  events  is  as  follows.  After 
the  trajectory  {X^,  k  <n-\-m,  has  entered  a  renovating  set  A  g  3|+m,  the  future 
evolution  of  the  process  will  not  depend  on  the  values  {X^},  k  <  n  +  m,  but  will  be 
determined  by  the  values  of  §&,  %k+i , . . .  only.  It  is  not  a  complete  “regeneration”  of 
the  process  which  we  dealt  with  in  Chap.  13  while  studying  Markov  chains  (first  of 
all,  because  the  are  now,  generally  speaking,  dependent),  but  it  still  enables  us 
to  establish  ergodicity  of  the  sequence  Xn  (in  approximately  the  same  sense  as  in 
Chap.  13). 

Note  that,  generally  speaking,  the  event  A  and  hence  the  function  g  may  depend 
on  the  initial  value  Xo.  If  Xo  is  random  then  a  renovating  event  is  to  be  taken  from 
the  a -algebra  x  <r(Xo). 

In  what  follows  it  will  be  assumed  that  the  sequence  {§ n }  is  stationary .  The  sym¬ 
bol  U  will  denote  the  measure  preserving  shift  transformation  of  $  -measurable  ran¬ 
dom  variables  generated  by  {^},  so  that  U^n=  §w+i,  and  the  symbol  T  will  denote 
the  shift  transformation  of  sets  (events)  from  the  a -algebra  ^  •  Hn+ 1  (&0  —  Hn  (T CO) . 
The  symbols  Un  and  Tn,n  >  0,  will  denote  the  powers  (iterations)  of  these  transfor¬ 
mations  respectively  (so  that  Ul  =  U,  Tl  =  T ;  U°  and  T°  are  identity  transforma¬ 
tions),  while  U~n  and  T~u  are  transformations  inverse  to  Un  and  Tn ,  respectively. 

A  sequence  of  events  { A^}  is  said  to  be  stationary  if  Ak  =  TkAo  for  all  k. 

Example  17.2.1  Consider  a  real- valued  sequence 

Xn+i  =  (Xn  +f„)+,  Xo  =  const  >0,  n>  0, 


(17.2.2) 
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where  =  max(0,  v)  and  {§„}  is  a  stationary  metric  transitive  sequence.  As  we 
already  know  from  Sect.  12.4,  the  sequence  { Xn }  describes  the  dynamics  of  waiting 
times  for  customers  in  a  single-channel  service  system.  The  difference  is  that  in 
Sect.  12.4  the  initial  value  has  subscript  1  rather  than  0,  and  that  now  the  sequence 
{§„}  has  a  more  general  nature.  Furthermore,  it  was  established  in  Sect.  12.4  that 
Eq.  (17.2.2)  has  the  solution 


Xn- (-1  —  max^^^,  Xq  +  Sn) , 


(17.2.3) 


where 


n 


k= 0 


max 

-1 <j<k 


E 


(17.2.4) 

(certain  changes  in  the  subscripts  in  comparison  to  (17.2.4)  are  caused  by  different 
indexing  of  the  initial  values).  From  representation  (17.2.3)  one  can  see  that  the 
event 


Bn  ■■=  {*0  +  s„  <  0,  Sn,n  =  0}  € 

implies  the  event  {Xn+\  =  0}  and  so  is  renovating  for  m  =  0,  g(y)  =  0.  If  Xn+\  =  0 
then 


X-n~\- 2  §  1  ^ n-\-\ ’  X-n-\- 3  <^>2 ^/i-f-1?  ^/i-f-2)  •  n-\-\  ’ 

and  so  on  do  not  depend  on  Xq. 

Now  consider,  for  some  no  >  1  and  any  n>no,  the  narrower  event 


An  • — 


1  A0  -h  SUp 

1  j>n  o 


(we  assume  that  the  sequence  {%n}  is  defined  on  the  whole  axis).  Clearly,  An  C  Bn  C 
{Xn+\  =  0},  so  An  is  a  renovating  event  as  well.  But,  unlike  Bn ,  the  renovating 
event  An  is  stationary :  An  =  Tn  Ao. 

We  assume  now  that  E§o  <  0  and  show  that  in  this  case  P(Ao)  >  0  for  sufficiently 
large  no-  In  order  to  do  this,  we  first  establish  that  P(So?00  =  0)  >  0.  Since,  by  the 

ergodic  theorem,  Sqj  —>  —  oo  as  j  ->  oo,  we  see  that  So,oo  is  a  proper  random 
variable  and  there  exists  a  u  such  that  P(So,oo  <  n)  >  0.  By  the  total  probability 
formula, 


Therefore  there  exists  a  j  such  that 


>0. 
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But  the  supremum  in  the  last  expression  has  the  same  distribution  as  So,oo-  This 

—  as 

proves  that  p  :=  P(So,oo  =  0)  >  0.  Next,  since  Sqj  — >  — oo,  one  also  has 

a.s. 

sup j>k  Sqj  —  oo  as  k  — >►  oo.  Therefore,  P(sup;>^  Sqj  <  —  Xo)  — >  1  as  k  — >  oo, 

and  hence  there  exists  an  no  such  that 

p(  sup  Soj  <  -x0)  >  1  -  f 

Xj>n 0  7  ^ 

Since  P(AZ?)  >P(A)+P(#)  —  1  for  any  events  A  and  B ,  the  aforesaid  implies  that 
P(Ao)  >  W2  >  0. 

In  the  assertions  below,  we  will  use  the  existence  of  stationary  renovating  events 
An  with  P(A0)  >  0  as  a  condition  insuring  convergence  of  the  s.r.s.  Xn  to  a  station¬ 
ary  sequence.  However,  in  the  last  example  such  convergence  can  be  established 
directly.  Let  E§o  <  0.  Then  by  (17.2.3),  for  any  fixed  v, 

P(2fn+i  >  v)  =  P(Sn,n  >  v)  T  P (Sn,n  £  v,  Xq  T-  Sn  >  w), 

where  evidently 

P(X0  +  >  u)  0,  P(5n>n  >  u)  t  P(So,oo  >  u) 

as  n  — >  oo.  Hence  the  following  limit  exists 

lim  P(X„  >  v)  =  P(5o  oo  >  v).  (17.2.5) 

n—>cc 

Recall  that  in  the  above  example  the  sequence  of  events  An  becomes  renovating 
for  n  >  riQ.  But  we  can  define  other  renovating  events  Cn  along  with  a  number  m 
and  function  g  :  Mm+ 1  R  as  follows: 

m  :=  w0,  :=  TmAn,  g(y0 , . . . ,  ym)  :=  0. 

The  events  Cn  e  are  renovating  for  {Xn}  on  the  segment  [n,  n  +  m]  for  all 

n  >  0,  so  in  this  case  the  wo  in  the  definition  of  a  renovating  sequence  will  be  equal 
to  0. 

A  similar  argument  can  also  be  used  in  the  general  case  for  arbitrary  renovat¬ 
ing  events.  Therefore  we  will  assume  in  the  sequel  that  the  number  no  from  the 
definition  of  renovating  events  is  equal  to  zero. 

In  the  general  case,  the  following  assertion  is  valid. 


Theorem  17.2.1  Let  {§ n }  be  an  arbitrary  stationary  sequence  and  for  the  s.r.s.  {Xn} 
there  exists  a  sequence  of  renovating  events  {An}  such  that 


|J  AjT~sAj+s 

7  =  1 


P 


1  as  n 


oo 


(17.2.6) 
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uniformly  in  s  >  1.  Then  one  can  define ,  on  a  common  probability  space  with  {Xn}, 
a  stationary  sequence  {Xn  :=  UnX °}  satisfying  the  equations  Xn+l  =  f(Xn,^n) 
and  such  that 

p{X£  =  Xk  for  all  k  >  n]  — >  1  asn^oo.  (17.2.7) 

If  the  sequence  {§ n }  is  metric  transitive  and  the  events  An  are  stationary ,  then  the 
relations  P(Ao)  >  0  and  P(U„°°=0  A  n)  =  1  are  equivalent  and  imply  (17.2.6)  and 
(17.2.7). 


Note  also  that  if  we  introduce  the  measure  Jt(B)  =  P(X°  e  5)  (as  we  did  in 
Chap.  13),  then  (17.2.7)  will  imply  convergence  in  total  variation: 


sup 


P (Xn  €  B) 


7t(B)  0  as  n  — >  oo. 


Proof  of  Theorem  17.2.1  First  we  show  that  (17.2.6)  implies  that 


P |  \Xn +k  7“  ^ 


as  ^  >  oo 


(17.2.8) 


\  k=o  / 

uniformly  in  s  >  0.  For  a  fixed  s  >1,  consider  the  sequence  XSj  =  U~sXs+j.  It  is 
defined  for/  >  —s,  and 

X-s  =  *0,  xs_s+1  =  f(XS_s,^-s)  =  f(X 0,  §_,) 
and  so  on.  It  is  clear  that  the  event 


{ Xj  =  XSj  for  some  j  e  [0,  n]] 

implies  the  event 

[X  +  n  +  k  =  Xsn+k  for  all  k  >  0}. 

We  show  that 

P^LJ(V  =  y}^  ^  1  asn^oo. 

For  simplicity’s  sake  put  m  =  0.  Then,  for  the  event  Xj+ \  =XSj+l  to  occur,  it  suf¬ 
fices  that  the  events  Aj  and  T~s  Aj+S  occur  simultaneously.  In  other  words, 

n— 1  n  oo 

U  Aj T~s A j+s  cUl^/  =  A-;|cni Xn+k  =  K+k } • 

7=0  7=1  k= 0 

Therefore  (17.2.6)  implies  (17.2.8)  and  convergence 

v(xnk  /  xnk+s)  ->  0 


as  n 


00 
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uniformly  in  k  >  0  and  s  >  0.  If  we  introduce  the  metric  p  putting  p(x,y)  :=  1  for 
x  ^  y,  p(x,x)  =  0,  then  the  aforesaid  means  that,  for  any  8  >  0,  there  exists  an  N 
such  that 


P {p{x'lxnk+s)  >S)  =  P (p(Xl Xl+S)  ^ 0 )<s 

for  n  >  N  and  any  k  >0,  s  >0,  i.e.  Xk  is  a  Cauchy  sequence  with  respect  to  conver¬ 
gence  in  probability  for  each  k.  Because  any  space  X  is  complete  with  such  a  metric, 

■p 

there  exists  a  random  variable  Xk  such  that  X k  —^Xkasn^oo  (see  Lemma  4.2). 
Due  to  the  specific  nature  of  the  metric  p  this  means  that 

Y(Xnk  ^Xk)^0  as  n  ->  oo.  (17.2.9) 

The  sequence  Xk  is  stationary.  Indeed,  as  n  — >  oo, 

P(Xk+1  /  UXk)  =  P{xnk+l  /  UXt)  +  0(1)  =  P(xnk+l  /  Xnk-\)  +  0(1)  =  o(l). 

Since  the  probability  P(X^+1  ^UXk)  does  not  depend  on  n ,  xk+l  =  UXk  a.s. 
Further,  Xn+k+\  =  f(Xn+k,  £„+*),  and  therefore 

xnk+l  =  U~nf(Xn+k,  $„+*)  =  f{xnk,^k).  (17.2.10) 

The  left  and  right-hand  sides  here  converge  in  probability  to  Xk+x  and  f(Xk,  &), 
respectively.  This  means  that  Xk+l  =  f(Xk,  §&)• 

To  prove  convergence  (17.2.7)  it  suffices  to  note  that,  by  virtue  of  (17.2.10),  the 
values  Xnk  and  Xk,  after  having  become  equal  for  some  k ,  will  never  be  different  for 
greater  values  of  k.  Therefore,  as  well  as  (17.2.9)  one  has  the  relation 

P(  \J{xk^xk]\  =p(  \j{xk+n^xk+n)\^0  as  n^oo, 

\  k>0  /  \k> 0  / 

which  is  equivalent  to  (17.2.7). 

The  last  assertion  of  the  theorem  follows  from  Theorem  16.2.5.  The  theorem  is 
proved.  □ 

Remark  17.2.1  It  turns  out  that  condition  (17.2.6)  is  also  a  necessary  one  for  con¬ 
vergence  (17.2.7)  (see  [6]).  For  more  details  on  convergence  of  stochastic  recursive 
sequences  and  their  generalisations,  and  also  on  the  relationship  between  (17.2.6) 
and  conditions  (I)  and  (II)  from  Chap.  13,  see  [6]. 

In  Example  17.2.1  the  sequence  Xk  was  actually  found  in  an  explicit  form  (see 
(17.2.3)  and  (17.2.5)): 

Xk  =  Sk-oo  =  sup5/_1. 

j>  o 


(17.2.11) 
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These  random  variables  are  proper  by  Corollary  16.3.1.  It  is  not  hard  to  also  see 
that,  for  Xo  =  0,  one  has  (see  (17.2.3)) 

U~lXn+k\Xk.  (17.2.12) 


17.2.2  Boundedness  of  Random  Sequences 


Consider  now  conditions  of  boundedness  of  an  s.r.s.  in  spaces  X  =  [0,  oo)  and 
X  =  (—oo,  oo).  Assertions  about  boundedness  will  be  stated  in  terms  of  existence 
of  stationary  majorants,  i.e.  stationary  sequences  Mn  such  that 

Xn  <  Mn  for  all  n. 

Results  of  this  kind  will  be  useful  for  constructing  stationary  renovating  sequences. 

Majorants  will  be  constructed  for  a  class  of  random  sequences  more  general  than 
stochastic  recursive  sequences.  Namely,  we  will  consider  the  class  of  random  se¬ 
quences  satisfying  the  inequalities 

Xn+i  <  (Xn+h(Xn,  §„))+,  (17.2.13) 

where  the  measurable  function  h  will  in  turn  be  bounded  by  rather  simple  functions 
of  Xn  and  §w.  The  sequence  {§ n }  will  be  assumed  given  on  the  whole  axis. 


Theorem  17.2.2  Assume  that  there  exist  a  number  N  >  0  and  a  measurable  func¬ 
tion  gi  with  Egj  (fn)  <  0  such  that  (17.2.13)  holds  with 


h(x,y)  < 


giOO 

gi(y)  +  N  -x 


for  x  >  N, 
for  x  <  N. 


IfX  o  <  M  <  oo,  then  the  stationary  sequence 


(17.2.14) 


Mn  =  ma x(Af,  N)  +  sup  Sn-ij, 

j>- 1 


(17.2.15) 


where  Sn,-\  =0  and  Sk,j  =  gi  (%k)  H - h  gi  (^k-j)for  j  >  0,  is  a  majorantfor  Xn. 


Proof  For  brevity’s  sake,  put  &  :=  gi(^),  Z  :=  ma x(M,  N ),  and  Zn  Xn  —  Z . 
Then  Zn  will  satisfy  the  following  inequalities: 

^  <  (Zn  +  Z  +  ^)+  —  Z  <  (Zn  +  £n)+  for  Zn>  N  —  Z, 

n+i  ~  (v  +  ?„)+  -  z  <  f  +  for  Zn  <  N  —  Z. 

k. 

Consider  now  a  sequence  {Tw}  defined  by  the  relations  To  =  0  and 


T^+i  —  (Tw  +  ^)_*_- 
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Assume  that  Zn  <Yn.lfZn>N  —  Z  then 

Z^+l  —  (Zn  +  £ n ^  (Xn  T  =  Yn+ 1- 


If  Z„  <  A  -Z  then 


Z^  +  l  ^  ^  —  Yn-\-l. 

Because  Zo  <  0  =  Fo ,  it  is  evident  that  Yn  for  all  n.  But  we  know  the  solution 
of  the  equation  for  Yn  and,  by  virtue  of  (17.2.11)  and  (17.2.13), 


Xn  Z  <  SUp  Sn — 1 ,  j  • 

j>- 1 


The  theorem  is  proved.  □ 

Theorem  17.2.2A  Assume  that  there  exist  a  number  A  >  0  am/  measurable  func¬ 
tions  gi  and  g2  such  that 


Egi(^)<0,  Eg2(§«)  <  0  (17.2.16) 


and 


h(x,y )  < 


giCy) 

giCy)  +  #200 


/arv  >  A, 
for  x  <  A. 


(17.2.17) 


//Z0  <  M  <  oo,  t/te  conditions  of  Theorem  17.2.2  are  satisfied  (possibly  for 
other  A  and  g  i)  and  for  Xn  there  exists  a  stationary  majorant  of  the  form  (17.2.15). 


Proof  We  set  g  :=  — Egi(£„)  >  0  and  find  L  >  0  such  that  Efefe);  g2(£n)  >  L)  < 
g/2.  Introduce  the  function 

g* (y)  '■=  g iOO  +  g2(y)i{g2(y)  >  l). 

Then  Eg*(§„)  <  -g/2  <  0  and 

h(x,  y)  <  giO)  +  g2(y)l(x  <  N ) 

<  tfOO  +  ^OOK*  <N)~  g2(y)i(g2(.y)  >  L) 

<  gf Cy)  +  Ll(x  <N)<  gf(y)  +  (L  +  N  —  x)I(x  <  N) 

—  §l'(3;)  +  (^  +  V—  x)l(x  <  L  +  N). 

This  means  that  inequalities  (17.2.14)  hold  with  A  replaced  with  A*  =  A  +  L. 
The  theorem  is  proved.  □ 

Note  again  that  in  Theorems  17.2.2  and  17.2.2A  we  did  not  assume  that  {Xn}  is 


an  s.r.s. 
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The  reader  will  notice  the  similarity  of  the  conditions  of  Theorems  17.2.2  and 
17.2.2A  to  the  boundedness  condition  in  Sect.  15.5,  Theorem  13.7.3  and  Corol¬ 
lary  13.7.1. 

The  form  of  the  assertions  of  Theorems  17.2.2  and  17. 2. 2 A  enables  one  to  con¬ 
struct  stationary  renovating  events  for  a  rather  wide  class  of  nonnegative  stochastic 
recursive  sequences  (so  that  A  =  [0,  oo))  having,  say,  a  “positive  atom”  at  0.  It  is 
convenient  to  write  such  sequences  in  the  form 

Xn+l  =  {Xn+h(Xn^n))+.  (17.2.18) 

Example  17.2.2  Let  an  s.r.s.  (see  (17.1.1))  be  described  by  Eq.  (17.2.18)  and  satisfy 
conditions  (17.2.14)  or  (17.2.17),  where  the  function  h  is  sufficiently  “regular”  to 
ensure  that 

Bnj  =  fi  {h(t,%n)  <  - 1 } 

t<T 

is  an  event  for  any  T .  (For  instance,  it  is  enough  to  require  h(t,v )  to  have  at  most 
a  countable  set  of  discontinuity  points  t.  Then  the  set  Bn  j  can  be  expressed  as 
the  intersection  of  countably  many  events  (^\k{h(tk,  ^n)  <  —  4},  where  {4}  form 
a  countable  set  dense  on  [0,  T].)  Furthermore,  let  there  exist  an  L  >  0  such  that 

P(Mn  <  L,  Bu,l)  >  0  (17.2.19) 

(Mn  was  defined  in  (17.2.15)).  Then  the  event  An  =  {Mn  <  L}Bn  l  is  clearly  a 
positive  stationary  renovating  event  with  the  function  g(y)  =  (h( 0,  y))+,  m  =  0. 
(On  the  set  An  e  we  have  Xn+\  =  0,  Xn+2  =  h( 0,  ^+i)+  and  so  on.)  Therefore, 
an  s.r.s.  satisfying  (17.2.18)  satisfies  the  conditions  of  Theorem  17.2.1  and  is  ergodic 
in  the  sense  of  assertion  (17.2.7). 

It  can  happen  that,  from  a  point  t  <  L,  it  would  be  impossible  to  reach  the 
point  0  in  one  step,  but  it  could  be  done  in  m  >  1  steps.  If  B  is  the  set  of  sequences 
(§w, . . . ,  %n+m)  that  effect  such  a  transition,  and  P (Mn  <  L),  then  An  =  {Mn  <  L}B 
will  also  be  stationary  renovating  events. 


17.3  Ergodicity  Conditions  Related  to  the  Monotonicity  of  / 

Now  we  consider  ergodicity  conditions  for  stochastic  recursive  sequences  that  are 
related  to  the  analytic  properties  of  the  function  /  from  (17.1.1).  As  we  already 
noted,  the  sequence  f(x,  §&),  k  —  1,2,...,  may  be  considered  as  a  sequence  of 
random  transformations  of  the  space  A.  Relation  (17.1.1)  shows  that  Xn+\  is  the 
result  of  the  application  of  n  +  1  random  transformations  /(•,§£),  k  =  0,  1 , ...  ,n, 
to  the  initial  value  Xq  =  x  e  A.  Denoting  by  the  vector  =  (£„,...,  %n+k) 
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and  by  fa  the  k-th  iteration  of  the  function  /:  f\(x,  y\)  =  f(x,  y\),  fi(x,yi,  y2)  = 
/(/(x,  y i),  J2)  and  so  on,  we  can  re-write  (17.1.1)  for  Ao  =  x  in  the  form 

Xn-\- 1  —  Xn-\-l  (x)  —  fn+ 1  (-^5  ?0)’ 
so  that  the  “forward”  and  “backward”  equations  hold  true: 

fn+l(x,  to)  =  /(/«(*’  to _1).  t«)  =  /„(/(*,  to),  t")-  (173.1) 

In  the  present  section  we  will  be  studying  stochastic  recursive  sequences  for 
which  the  function  /  from  representation  (17.1.1)  is  monotone  in  the  first  argu¬ 
ment.  To  this  end,  we  need  to  assume  that  a  partial  order  relation  “>”  is  defined 
in  the  space  X.  In  the  space  X  =  W1  of  vectors  x  =  (jc (1 x(d))  (or  its  sub¬ 
spaces)  the  order  relation  can  be  introduced  in  a  natural  way  by  putting  x\  >  X2  if 
x\(k)  >  X2  (k)  for  all  k. 

Furthermore,  we  will  assume  that,  for  each  non-decreasing  sequence  x\  <  X2  < 

•  •  •  <  xn  <  . . . ,  there  exists  a  limit  x  e  X,  i.e.  the  smallest  element  x  e  X  for  which 
Xk  <  x  for  all  k.  In  that  case  we  will  write  f  i  or  lim^oo  xk  =  x.  In  X  —  W1 
such  convergence  will  mean  conventional  convergence.  To  facilitate  this,  we  will 
need  to  complete  the  space  wl  by  adding  points  with  infinite  components. 

Theorem  17.3.1  (Loynes)  Suppose  that  the  transformation  f  =  f(x,  y)  and  space 
X  satisfy  the  following  conditions’. 

(1)  there  exists  an  xo  e  X  such  that  f{x o,  y)  >  xq  for  all  y  e  y\ 

(2)  the  function  f  is  monotone  in  the  first  argument'.  f(x  i,  y)  >  f(x  2,  y)  ifx  \  >  X2\ 

(3)  the  function  f  is  continuous  in  the  first  argument  with  respect  to  the  above 
convergence :  f(xn,y )  t  fix,  y)  if  xn  \  x. 

Then  there  exists  a  stationary  random  sequence  {Xn}  satisfying  Eq.  (17.1.1): 
Xn+x  —UXn  —  f(Xn,  %n),  such  that 

U~nXn+six))  t  X"  as  n  ^  00,  (17.3.2) 

where  convergence  takes  place  for  all  elementary  outcomes. 

Since  the  distributions  of  Xn  and  U~nXn  coincide,  in  the  case  where  conver¬ 
gence  of  random  variables  rjn  f  q  means  convergence  (in  a  certain  sense)  of  their 
distributions  (as  is  the  case  when  X  =  W1),  Theorem  17.2.1  also  implies  conver¬ 
gence  of  the  distributions  of  Xn  to  that  of  X°  as  n  — >►  00. 

Remark  17.3.1  A  substantial  drawback  of  this  theorem  is  that  it  holds  only  for  a 
single  initial  value  Ao  =  xo.  This  drawback  disappears  if  the  point  vo  is  accessible 
with  probability  1  from  any  x  e  X,  and  §&  are  independent.  In  that  case  vo  is  likely 
to  be  a  positive  atom,  and  Theorem  13.6.1  for  Markov  chains  is  also  applicable. 
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The  limiting  sequence  Xs  in  (17.3.2)  can  be  “improper”  (in  spaces  X  =  W1  it 
may  assume  infinite  values).  The  sequence  Xs  will  be  proper  if  the  s.r.s.  Xn  satisfies, 
say,  the  conditions  of  the  theorems  of  Sect.  15.5  or  the  conditions  of  Theorem  17.2.2. 

Proof  of  Theorem  17.3.1  Put 

vjk  :=  fk+s(x o,  =  u~kfk+s(x o, =  U~kXk+s(x o). 

Here  the  superscript  —k  indicates  the  number  of  the  element  of  the  driving  sequence 
{^}-_oo  such  elements  of  this  sequence  starting  from  that  number  are  used 

for  constructing  the  s.r.s.  The  subscript  s  is  the  “time  epoch”  at  which  we  observe 
the  value  of  the  s.r.s.  From  the  “backward”  equation  in  (17.3.1)  we  get  that 

v7k~l  =  fk+S{f{xo^-k-\),HSSkl)  >  fk+s{x 0,%-~kl)  =  V~k. 

This  means  that  the  sequence  vfk  increases  as  k  grows,  and  therefore  there  exists  a 
random  variable  Xs  e  X  such  that 

vfk  =  U~kXk+s(j t0)  t  Xs  as  k  ->  oo. 

Further,  vfk  is  a  function  of  .  Therefore,  Xs  is  a  function  of  : 

V'=G(^). 

Hence 

UXS  =  UG(i =  G(f-oo)  =  V+1, 

which  means  that  {X5}  is  stationary.  Using  the  “forward”  equation  from  (17.3.1), 
we  obtain  that 


v;k~' 1  =  /  (fk+s  (xo ,  C- 1 ) .  &  - 1 )  =  /  («7-  r 1 ,  &  - 1 )  • • 

Passing  to  the  limit  as  k  — >  oo  gives,  since  /  is  continuous,  that 

xs  =  f(xs-\$s- 1). 

The  theorem  is  proved.  □ 

Example  17.2.1  clearly  satisfies  all  the  conditions  of  Theorem  17.3.1  with  X  = 
[0,  oo),  xo  =  0,  and  f(x,  y)  =  (x  +  y)+. 


17.4  Ergodicity  Conditions  for  Contracting  in  Mean  Lipschitz 
Transformations 

In  this  section  we  will  assume  that  A'  is  a  complete  separable  metric  space  with 
metric  p.  Consider  the  following  conditions  on  the  iterations  Xk(x)  =  fk(x,  §q_1). 
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Condition  (B)  (boundedness).  For  some  vo  c  X  and  any  8  >  0,  there  exists  an 
N  =  N§  such  that,  for  all  n  >  1 , 

P(p(xo,  X„(xq)  >  N))  =  P{p(xo,  fn(xo,  to-1))  >  N)  <  S- 

It  is  not  hard  to  see  that  condition  (B)  holds  (possibly  with  a  different  N)  as  soon  as 
we  can  establish  that,  for  some  m  >  1 ,  the  above  inequality  holds  for  all  n  >  m . 

Condition  (B)  is  clearly  met  for  stochastic  random  sequences  satisfying  the  con¬ 
ditions  of  Theorems  17.2.2  and  17.2.2A  or  the  theorems  of  Sect.  15.5. 

Condition  (C)  (contraction  in  mean).  The  function  f  is  continuous  in  the  first 
argument  and  there  exist  m  >  1,  ft  >  0  and  a  measurable  function  q  :  Mm  — >  R  such 
that,  for  any  x\  and  X2, 

p{fm(x  1,  C”1)’  M*2,  C“‘))  <  q  ($”“>(*1  ^2), 

m~1E\nq^Q~l)  <  -0  <  0. 

Observe  that  conditions  (B)  and  (C)  are,  generally  speaking,  not  related  to  each 
other.  Let,  for  instance,  =  R,  Xo  >  0,  >  0,  p{x,  y)  =  \x  —  y\,  and  f(x,  y)  = 

bx  +  y,  so  that 


Xn+ 1  —  bXn  T  ^ n  • 

Then  condition  (C)  is  clearly  satisfied  for  0  <  b  <  1 ,  since 


f  y) 


f(*2,y) 


=  b\x\  —X2 


At  the  same  time,  condition  (B)  will  be  satisfied  if  and  only  if  E  In  §o  <  oo.  Indeed,  if 
Eln^o  =  oo,  then  the  event  {ln§£  >  —2k \nb}  occurs  infinitely  often  a.s.  But  Xn+\ 
has  the  same  distribution  as 

n  n 

b'l+1X  o  +  57  bk%k  =  bn+lXo  +  y^expjA'  In/?  +  ln^{. 
k=o  k= o 

where,  in  the  sum  on  the  right-hand  side,  the  number  of  terms  exceeding  exp{— k  In  b] 

p 

increases  unboundedly  as  n  grows.  This  means  that  X{n  +  1)  — >  oo  as  n  ^  oo.  The 
case  Eln^o  <  oo  is  treated  in  a  similar  way.  The  fact  that  (B),  generally  speaking, 
does  not  imply  (C)  is  obvious. 

As  before,  we  will  assume  that  the  “driving”  stationary  sequence  {£W}£1_00  is 
given  on  the  whole  axis.  Denote  by  U  the  respective  distribution  preserving  shift 
operator. 

Convergence  in  probability  and  a.s.  of  a  sequence  of  A-valued  random  vari¬ 
ables  rjn  e  A  ( qn  -!->  T) ,  T) n  q)  is  defined  in  the  natural  way  by  the  rela¬ 
tions  P (p(qn,  q)  >  8)  ->  0  as  n  — >  oo  and  P(p(qk,  q)  >  8  for  some  k  >  n)  ^  0 
as  n  — >  oo  for  any  8  >  0,  respectively. 
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Theorem  17.4.1  Assume  that  conditions  (B)  and  (C)  are  met.  Then  there  exists  a 
stationary  sequence  {Xn}  satisfying  (17.1.1): 

Xn+l  =  UXn  =  f(Xn,^n) 


such  that,  for  any  fixed  x, 

U~nXn+s(x)  Xs  as  n  — >  oo.  (17.4.1) 

This  convergence  is  uniform  in  x  over  any  bounded  subset  of  X . 

Theorem  17.2.2  implies  the  weak  convergence,  as  n  ->  oo,  of  the  distributions 
of  Xn(x)  to  that  of  X°.  Condition  (B)  is  clearly  necessary  for  ergodicity.  As  the 
example  of  a  generalised  autoregressive  process  below  shows,  condition  (C)  is  also 
necessary  in  some  cases. 

Set  Yn  :=  UnXn(x o),  where  vo  is  from  condition  (B).  We  will  need  the  following 
auxiliary  result. 

Lemma  17.4.1  Assume  that  conditions  (B)  and  (C)  are  met  and  the  stationary  se¬ 
quence  { 4  (f  km  +m  ~ 1 )  } -  oo  ^  erg°dic.  Then,  for  any  8  >  0,  there  exists  an  n§  such 
that,  for  all  k  >  0, 

supP (p(Yn+k,  Yn)  <  8  for  all  n  >  nf)  >1—5.  (17.4.2) 

k>  0 

For  ergodicity  of  {q{fkm+m~x)}fL_OQ  it  suffices  that  the  transformation  Tm  is  met¬ 
ric  transitive. 

The  lemma  means  that,  with  probability  1,  the  distance  p(Yn+k,  Yn)  tends  to  zero 
uniformly  in  k  as  n  ->  oo.  Relation  (17.4.2)  can  also  be  written  as  P(A«$)  <  8,  where 

As  :=  \J  {p(Yn+k,Yn)>&}. 

n>n$ 


Proof  of  Lemma  17.4.1  By  virtue  of  condition  (B),  there  exists  an  N  =  Ns  such 
that,  for  all  k  >  1 , 

P(p(x0,  XtM  >  N)  < 

Hence 

P(A5)  <  8/3  +  P (A§;  p{x o,  0n,k)  <  N). 

The  random  variable  0n^  :=  U~n~kXk(x o)  has  the  same  distribution  as  Xk(xo). 
Next,  by  virtue  of  (C), 

P(Yn+k->  Y n )  ^  p(fn+k(f  0?  fn(f  0?  §_^)) 
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<  q(Cm)p(fn+k-m(x  0,  %In-k)’  fn-m(x  0,  ^n~1)) 

=  q^Z,i)p(U-n-kXn+k_m(x o),  U~nXn.m(x o)).  (17.4.3) 

Denote  by  Bs  the  set  of  numbers  n  of  the  form  n  =  Im  +  s,  l  =  0,  1,  2, , 
0  <s<m,  and  put 


X,-:=ln,(|:£+-1).  >-1.2. 


Then,  for  n  e  Bs,  we  obtain  from  (17.4.3)  and  similar  relations  that 


p(Yn+k,  Yn)  <  exp 


7=1 


p(U-n~kXk+s(x o),  C/-nX,(x0)),  (17.4.4) 


where  the  last  factor  (denote  it  just  by  p)  is  bounded  from  above: 

P  <  p(x 0,  +  p(xo,  f/-”X,(xo)). 

The  random  variables  U~nXj(x o)  have  the  same  distribution  as  Xj(x o).  By  virtue 
of  (B),  there  exists  an  TV  =  7V«$  such  that,  for  all  j  >  1 , 

F(p(x0,Xj(x0))>  N)<^~. 

Hence,  for  all  n,  k  and  s,  we  have  P(p  >  27V)  <  8 /(2m),  and  the  right-hand  side 
of  (17.4.4)  does  not  exceed  27V  exp {Y^j=i  ^7}  on  the  complement  set  {p  <  27V}. 

Because  EA;-  <  —m/3  <  0  and  the  sequence  {Xj}  is  metric  transitive,  by  the  er- 
godic  Theorem  16.3.1  we  have 


/ 

Xj  <  —mpl/2 
7=1 


for  all  /  >  /(m),  where  /(m)  is  a  proper  random  variable.  Choose  l\  and  /2  so  that  the 
inequalities 

<ln5-ln2Al,  P(/(co)  > /2)  <  - 

hold.  Then,  putting 

h  :=max(/i,/2),  ns:=mls,  Ass:=  (J  {p(T„+jt,  T„)  >  5}, 

n>n§,  neBs 


we  obtain  that 


P(A^)<P(p>2A0  +  P(A^;p<  TV) 


Wu 

r 

27V  exp 

\/>/« 

522 


17  Stochastic  Recursive  Sequences 


But  the  intersection  of  the  events  from  the  term  with  {Is  >  /(&>)}  is  empty.  Therefore, 
the  former  event  is  a  subset  of  the  event  {/(&>)  >  Is},  and 

X  rn  —  l 

P(A^)<-,  P (A,)  <  <  5. 

s=0 


The  lemma  is  proved.  □ 

Lemma  17.4.2  (Completeness  of  X  with  respect  to  convergence  in  probability)  Let 
X  be  a  complete  metric  space.  If  a  sequence  of  X -valued  random  elements  qn  is 
such  that,  for  any  8  >  0, 


Pn  :=  supP (p{rin+k,  n„)  >  8)  o 

k>  0 

as  n  ^  o o,  then  there  exists  a  random  element  q  E  X  such  that  q  -4-  q  ( that  is, 
P (p(rjn,  rj)  >  8)  — >  0  as  n  — >  oo). 

Proof  For  given  £  and  5  choose  nk,  k  =  0,  l, ... ,  such  that 

supP(p(»7nit+s>  nnk)  >  2_*(5)  <  e2_<:, 

and,  for  the  sake  of  brevity,  put  ft  :=  qnic .  Consider  the  set 

oo 

D  :=  P)  Dk,  Dk  :=  [co  p(ft+ 1,  ft)  <  2_a5}. 

/:=0 

Then  P(D)  >  1  —  2e  and,  for  any  co  e  D,  one  has  ft(&>))  <  <52^_1  for 

all  s  >  1.  Hence  £&(&>)  is  a  Cauchy  sequence  in  and  there  exists  an  q  =  q(co)  e 

such  that  ft(&>)  — >  77(0;).  Since  s  is  arbitrary,  this  means  that  ft  — >  q  as  k  — >  oo, 
and 


P(p(fo,  *?)  >  25)  <  P 


[J  P(ft+i,  ft)  >  2  ^<5 

£=o 


oo 

<Y/P{pttk+iXk)>2-k8)<2e. 

k= 0 


Therefore,  for  any  n  >  no, 

p (p(rin,  n )  >  3(5)  <  P (p(t]n,  T]„0)  >s)+  P(p(fo, >?)  >  2(5)  <  3e. 


Since  e  and  5  are  arbitrary,  the  lemma  is  proved. 


□ 
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Proof  of  Theorem  17.4.1  From  Lemma  17.4.1  it  follows  that 

supP(p(y„+jt,  Yn)  >  (5)  -*  0  as  n^oo. 

k 


This  means  that  Yn  is  a  Cauchy  sequence  with  respect  to  convergence  in  probability, 
and  by  Lemma  17.4.2  there  exists  a  random  variable  A0  such  that 


p 


n 


>  x°, 


U-nXn+s(x0 )  =  Us(U-n~sXn+s{x o))  =  UsYn+s  USX°  =  X: 


(17.4.5) 


By  continuity  of  /, 


U-nXn+s+l(x0 )  =  U-nf(Xn+s(xo),$n+s) 

=  f(U~nXn+s(x o),  fe)  -A  /(X*.  &)  =  V+1. 


We  proved  the  required  convergence  for  a  fixed  initial  value  x$.  For  an  arbitrary 
x  e  Cn  =  {z  :  p(x o,  z)  <  A^},  one  has 

p(U~nXn(x),  X°)  <  p(f/-'!X„(x),  U~nXn(xo))  +  p(!7-',X„(xo),  X°),  (17.4.6) 

where  the  first  term  on  the  right-hand  side  converges  in  probability  to  0  uniformly 
in  v  G  Cn  -  For  n  =  lm  this  follows  from  the  inequality  (see  condition  (C)) 


p(U~nXn(x),  U~nXn(x o))  <  IV exp 


7=1 


(17.4.7) 


and  the  above  argument.  Similar  relations  hold  for  n  =  Im  +  s,  m  >  s  >  0.  This, 
together  with  (17.4.5)  and  (17.4.6),  implies  that 

U~nXn+s(x)  ^4  Xs  =  USX° 


uniformly  in  x  e  Cn.  This  proves  the  assertion  of  the  theorem  in  regard  to  conver¬ 
gence  in  probability. 

We  now  prove  convergence  with  probability  1.  To  this  end,  one  should  repeat 
the  argument  proving  Lemma  17.4.1,  but  bounding  p(X°,  U~nXn{x))  rather  than 
p(Yn+k,  Tw).  Assuming  for  simplicity’s  sake  that  s  =  0  (n  is  a  multiple  of  m),  we 
get  (similarly  to  (17.4.4))  that,  for  any  x. 


p(X°,U~nXn(x))  <  p(x,U~nX°)  exp 


—n  v0' 


7=1 


(17.4.8) 


The  rest  of  the  argument  of  Lemma  17.4.1  remains  unchanged.  This  implies  that, 
for  any  8  >  0  and  sufficiently  large  ns. 


<8. 
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Theorem  17.4.1  is  proved. 


□ 


Example  17.4.1  (Generalised  autoregression)  Let  X  =  R.  A  generalised  autoregres¬ 
sion  process  is  defined  by  the  relations 

Xn+ 1  =  G{'QF(Xn)  +  Tin),  (17.4.9) 

where  F  and  G  are  functions  mapping  M  and  =  (fw,  77^)  is  a  stationary 
ergodic  driving  sequence,  so  that  {Xn}  is  an  s.r.s.  with  the  function 

f(x,  y)  =  G(yi,  F(x)  +  ^2),  y  =  (yi,y2)  e  V  =  K2. 

If  the  functions  F  and  G  are  nondecreasing  and  left  continuous,  G(x)  >0  for  all 
v  e  R,  and  the  elements  t,n  are  nonnegative,  then  the  process  (17.4.9)  satisfies  the 
condition  of  Theorem  17.3.1,  and  therefore  U~n+sXn( 0)  f  Xs  with  probability  1  (as 
n  — >  oo).  To  establish  convergence  to  a  proper  stationary  sequence  Xs ,  one  has  to 
prove  uniform  boundedness  in  probability  (in  n)  of  the  sequence  Xn  (0)  (see  below). 

Now  we  will  establish  under  what  conditions  the  sequence  (17.4.9)  will  satisfy 
the  conditions  of  Theorem  17.4.1.  Suppose  that  the  functions  F  and  G  satisfy  the 
Lipschitz  condition: 


G(x  i) 


G(x2) 


<  CG\x\  -X2\, 


Fix  i) 


F{x2) 


<  CF \x\  —X2 


Then 


fix  i,£o)  -  /(*2,£o)|  <  cg\$o{F(xi)  -  F(x2))\  <  cFcG\$o\\xi  -  x2\.  (17.4.10) 


Theorem  17.4.2  Under  the  above  assumptions ,  the  sequence  (17.4.9)  will  satisfy 
condition  (C)  if 


in cGcp  +  Eln  |£ol  <  0-  (17.4.11) 

The  sequence  (17.4.9)  will  satisfy  condition  (B)  7/(17.4.11)  holds  and ,  moreover ; 

E(ln|/7ol)+  <  oo.  (17.4.12) 

When  (17.4.11)  and  (17.4.12)  hold ,  the  sequence  (17.4.9)  has  a  stationary  majorant , 
/.£.  t/zere  exists  a  stationary  sequence  Mn  (< depending  on  Xo)  such  that  \Xn  \  <  Mn 
for  all  n. 


Proof  That  condition  (C)  for  p(vi,  ^2)  =  \x\  —  X2\  follows  from  (17.4.10)  is  obvi¬ 
ous.  We  prove  (B).  To  do  this,  we  will  construct  a  stationary  majorant  for  \Xn\.  One 
could  do  this  using  Theorems  17.2.2  and  17.2.2A.  In  our  case,  it  is  simpler  to  prove 
it  directly,  making  use  of  the  inequalities 


G(x)  <  G(0)  +cG\x 


F(x)  <  Fi 0)  +cF\x 


17.4  Ergodicity  Conditions 


525 


where  we  assume,  for  simplicity’s  sake,  that  G( 0)  and  F( 0)  are  finite.  Then 


l^w+i I  <  |G(0)|  +  ccl^nl  •  F(Xn)  +  cc\y] 
<  |G(0)|  +  CgCf \U  \  •  l^nl  +  Cg\U\ 


E(0)|  T  cq | Tjn |  =  0„|X(n)|  +  14, 


where 


n 


> — I  I  0?  Krc  • —  0^(0)  T  cg\^h 


rKS>) 


i  CG  I'M 


Eln /3n  <  0,  E(lnyw)+  <  oo. 
From  this  we  get  that,  for  Xq  =  x. 


\Xn-\-i  |  £  I* 

U-n\Xn+i\<\x 


n  n—  1 


n  f 

7=0  /=0 


"  \ 

|  |  JYij—i—i  “l-  Ytu 
j—n—l  / 


0  oo  /  0  \ 

n  Pj + X!  ( rU 

j=—n  1=0  \  j=—l  ! 


y-i- 1  +  yo- 


(17.4.13) 


Put 


0 

off  :=  ln/3y,  5/  :=  ^  ay. 

j=-i 

By  the  strong  law  of  large  numbers,  there  are  only  finitely  many  positive  values 
Si  —  al,  where  2 a  =  Eay  <  0.  Therefore,  for  all  /  except  for  those  with  S\  —  al  >  0, 


fl  Pj<eal- 

j=-l 

On  the  other  hand,  y~i~\  exceeds  the  level  /  only  finitely  often.  This  means  that  the 
series  in  (17.4.13)  (denote  it  by  R)  converges  with  probability  1.  Moreover, 


S  =  sup  Sk  >  Sn 
k>0 

is  a  proper  random  variable.  As  result,  we  obtain  that,  for  all  n , 

u~n \X„+i\  <  \x\es  +  R  +  Y0, 


where  all  the  terms  on  the  right-hand  side  are  proper  random  variables.  The  required 
majorant 

Mn  :=  U"-1  (\x\es  +  R  +  yo) 

is  constructed.  This  implies  that  (B)  is  met.  The  theorem  is  proved.  □ 
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The  assertion  of  Theorem  17.4.2  can  be  extended  to  the  multivariable  case 
X  =  W*,  d  >  1,  as  well  (see  [6]). 

Note  that  conditions  (17.4.1 1)  and  (17.4.12)  are,  in  a  certain  sense,  necessary  not 
only  for  convergence  U~n+S Xn(x)  — >  Xs ,  but  also  for  the  boundedness  of  Xn(x) 
(or  of  X°)  only.  This  fact  can  be  best  illustrated  in  the  case  when  F  (t)  =  G(t)  =  t . 
In  that  case,  U~nXn+s+i(x)  and  X5+1  admit  explicit  representations 

s  n+s  s 

U~nXn+s+i(x)=x 

j——n  1=0  j=s—l 

oo  s 

^5+i  _  Y\  fjtys-z-i  +  *7s- 

1=0  j=s—l 

Assume  that  E  in  f  >  0,  rj  =  1 ,  and  put 

o 

s:=  0,  Zj  :=  ln£/,  Zf—'^zj. 

j=-i 


Then 

oo  oo 

X 1  =  1  +  £Z/ ,  where  I (Z/  >  0)  =  oo 

/=0  /=0 

with  probability  1,  and  consequently  X 1  =  oo  and  Xn  — >►  oo  with  probability  1. 
If  E[ln  7]]+  =  oo  and  £  =  b  <  1  then 

oo 

X1  =  W  +  ^y^exp{y_/_i  +nn£}, 

1=0 


where  y7-  =  In  ijj;  the  event  {y~i~\  >  —llnb}  occurs  infinitely  often  with  probabil¬ 
ity  1.  This  means  that  X1  =  oo  and  Xn  — >►  oo  with  probability  1. 


Chapter  18 

Continuous  Time  Random  Processes 


Abstract  This  chapter  presents  elements  of  the  general  theory  of  continuous  time 
processes.  Section  18.1  introduces  the  key  concepts  of  random  processes,  sample 
paths,  cylinder  sets  and  finite-dimensional  distributions,  the  spaces  of  continuous 
functions  and  functions  without  discontinuities  of  the  second  kind,  and  equivalence 
of  random  processes.  Section  18.2  presents  the  fundamental  results  on  regularity 
of  processes:  Kolmogorov’s  theorem  on  existence  of  a  continuous  modification  and 
Kolmogorov-Chentsov’s  theorem  on  existence  of  an  equivalent  process  with  trajec¬ 
tories  without  discontinuities  of  the  second  kind.  The  section  also  contains  discus¬ 
sions  of  the  notions  of  separability,  stochastic  continuity  and  continuity  in  mean. 


18.1  General  Definitions 

Definition  18.1.1  A  random  process 1  is  a  family  of  random  variables  §(/)  =  §(/,  co) 
given  on  a  common  probability  space  (Q ,  P)  and  depending  on  a  parameter  t 
taking  values  in  some  set  T . 

A  random  process  will  be  written  as  {§(/),/  e  T}. 

The  sequences  of  random  variables  §i,  §2>  •  ■  •  considered  in  the  previous  sec¬ 
tions  are  random  processes  for  which  T  =  {1,2,  3,...}.  The  same  is  true  of  the 
sums  Si,  S2,  •  •  •  of  §1,  §2,  •  •  •  Markov  chains  {Xn,  n  =  0,  1, . . .},  martingales  {Xn; 
n  G  N},  stationary  and  stochastic  recursive  sequences  described  in  previous  chapters 
are  also  random  processes.  The  processes  for  which  the  set  T  can  be  identified  with 
the  whole  sequence  {...,— 1,0,  1,...}  or  a  part  thereof  are  usually  called  random 
processes  in  discrete  time ,  or  random  sequences. 

If  T  coincides  with  a  certain  real  interval  T  =  [a,b ]  (this  may  be  the  whole  real 
line  —  00  <  t  <  00  or  the  half-line  t  >  0),  then  the  collection  {§( t ),  t  e  T]  is  said  to 
be  a  process  in  continuous  time. 

Simple  examples  of  such  objects  are  renewal  processes  {rj(t)  ,  t  >  0}  described 
in  Chap.  10. 


1  As  well  as  the  term  “random  process”  one  also  often  uses  the  terms  “stochastic”  or  “probabilistic 
processes. 
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In  the  present  chapter  we  will  be  considering  continuous  time  processes  only. 
Interpretation  of  the  parameter  t  as  time  is,  of  course,  not  imperative.  It  appeared 
historically  because  in  most  problems  from  the  natural  sciences  which  led  to  the 
concept  of  random  process  the  parameter  t  had  the  meaning  of  time,  and  the  value 
§(0  was  what  one  would  observe  at  time  t. 

The  movement  of  a  gas  molecule  as  time  passes,  the  storage  level  in  a  water 
reservoir,  oscillations  of  an  airplane’s  wing  etc  could  be  viewed  as  examples  of  real 
world  random  processes. 

The  random  function 

oo 

g(f)  =  Tl-%  sin kt,  t  e[0,2n], 
k=\ 

where  the  are  independent  and  identically  distributed,  is  also  an  example  of  a 
random  process. 

Consider  a  random  process  {§( t ),  t  g  T}.  If  co  e  Q  is  fixed,  we  obtain  a  func¬ 
tion  §(f),  t  G  T ,  which  is  often  called  a  sample  function,  trajectory  or  path  of  the 
process.  Thus,  the  random  values  here  are  functions .  As  before,  we  could  consider 
here  a  sample  probability  space,  which  can  be  constructed  for  example  as  follows. 
Consider  the  space  X  of  functions  x{t),  t  e  T ,  to  which  the  trajectories  §  (t)  belong. 
Let,  further,  93^  be  the  a -algebra  of  subsets  of  X  generated  by  the  sets  of  the  form 

C  =  [x  G  X  :  x(t\)  G  B\, . . .  ,x(tn)  G  Bn }  (18.1.1) 

for  any  n,  any  t\, ... ,  tn  from  T ,  and  any  Borel  sets  B\, ...  ,Bn.  Sets  of  this  form 
are  called  cylinders ;  various  finite  unions  of  cylinder  sets  form  an  algebra  generat- 
mg  23^.  If  a  process  §(L  co)  is  given,  it  defines  a  measurable  mapping  of  (£?,  #) 
into  (X,  93^),  since  clearly  §  1  (C)  =  {co  :  §(•,  co)  G  C}  G  $  for  any  cylinder  C,  and 
therefore  ^~l(B)  g  $  for  any  Sg®^.  This  mapping  induces  a  distribution  on 
(X,  93^)  defined  by  the  equalities  P^(Z?)  =  P(§-1  (/?)).  The  triplet  (X,  93^,  P^)  is 
called  the  sample  probability  space.  In  that  space,  an  elementary  outcome  co  is 
identified  with  the  trajectory  of  the  process,  and  the  measure  P^  is  said  to  be  the 
distribution  of  the  process  §. 

Now  if,  considering  the  process  {§(t)}>  we  fix  the  time  epochs  t\,  t2, . . . ,  tn,  we 
will  get  a  multi-dimensional  random  variable  (£(fi,  co), . . . ,  u;)).  The  distri¬ 

butions  of  such  variables  are  said  to  be  the  finite -dimensional  distributions  of  the 
process. 

The  following  function  spaces  are  most  often  considered  as  spaces  X  in  the  the¬ 
ory  of  random  processes  with  continual  sets  T . 

1 .  The  set  of  all  functions  on  T : 


x = Rr  =  jq  , 

teT 

where  Rt  are  copies  of  the  real  line  (— oo,  oo).  This  space  is  usually  considered  with 
the  or  -algebra  93^  of  subsets  of  Mr  generated  by  cylinders. 


18.1  General  Definitions 
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2.  The  space  C(T)  of  continuous  functions  on  T  (we  will  write  C(a,b )  if 
T  =  [a,  b]).  In  this  space,  along  with  the  a -algebra  33 Tc  generated  by  cylinder  sub¬ 
sets  of  C(T)  (this  a -algebra  is  smaller  that  the  similar  a -algebra  in  Mr),  one  also 
often  considers  the  a  -algebra  33 c{T)  (the  Borel  a -algebra)  generated  by  the  sets 
open  with  respect  to  the  uniform  distance 


p(x,y)  :=  sup  y(t)  -  x(t) 
teT 


x,y  e  C(T). 


It  turns  out  that,  in  the  space  C(T ),  we  always  have  33 c{T)  =  33^  (see>  e-g-’  [14]). 

3.  The  space  D(T)  of  functions  having  left  and  right  limits  x(t  —  0)  and  x(t  +  0) 
at  each  point  t,  the  value  x(t)  being  equal  either  to  x(t  —  0)  or  to  x(t  +0).  If 
T  =  [a,  b],  it  is  also  assumed  that  v (a)  =  x (a  +  0)  and  x(b)  =  x(b  —  0).  This  space 
is  often  called  the  space  of  functions  without  discontinuities  of  the  second  kind.  The 
space  of  functions  for  which  at  all  other  points  x(t)  =  x(t  —  0)  ( x{b )  =  x(t  +  0)) 
will  be  denoted  by  D-(T)  (D+(T)).  The  space  D+(T)  (D_(T))  will  be  called  the 
space  of  right-continuous  (left-continuous)  functions.  For  example,  the  trajectories 
of  the  renewal  processes  discussed  in  Chap.  10  belong  to  D+(0,  oo). 

In  the  space  D(T )  one  can  also  construct  the  Borel  a -algebra  with  respect  to 
an  appropriate  metric,  but  we  will  restrict  ourselves  to  using  the  a -algebra  33 £ 
of  cylindric  subsets  of  D(T). 

Now  we  can  formulate  the  following  equivalent  definition  of  a  random  process. 
Let  X  be  a  given  function  space,  and  0  be  the  a -algebra  of  its  subsets  containing 
the  o -algebra  33^-  of  cylinders. 


Definition  18.1.2  A  random  process  §(t)  =  ^(t,  co)  is  a  measurable  (in  co)  mapping 
of  (£?,#,  P)  into  (X,  ©,P|)  (to  each  co  one  puts  into  correspondence  a  function 
^(t)  =  ^(t,  co)  so  that  §-1(G)  =  {co  :  §(•)  e  G}  e  #  for  G  e  0).  The  distribution 
is  said  to  be  the  distribution  of  the  process. 

The  condition  33^  C  0  is  needed  to  ensure  that  the  probabilities  of  cylinder 
sets  and,  in  particular,  the  probabilities  P (§(t)  e  B),  B  e  33^  are  correctly  defined, 
which  means  that  §(t)  are  random  variables. 

So  far  we  have  tacitly  assumed  that  the  process  is  given  and  it  is  known  that 
its  trajectories  lie  in  X.  However,  this  is  rarely  the  case.  More  often  one  tries  to 
describe  the  process  §(t)  in  terms  of  some  characteristics  of  its  distribution.  One 
could,  for  example,  specify  the  finite-dimensional  distributions  of  the  process.  From 
Kolmogorov’s  theorem  on  consistent  distributions2  3 * * *  (see  Appendix  2),  it  follows  that 


2  A  discontinuity  of  the  second  kind  is  associated  with  either  non-fading  oscillations  of  increasing 
frequency  or  escape  to  infinity. 

3Recall  the  definition  of  consistent  distributions.  Let  Rt,teT,be  real  lines  and  the  cr -algebras 

of  Borel  subsets  of  Rf.  Let  Tn  =  {t\, . . . ,  tn }  be  a  finite  subset  of  T .  The  finite-dimensional  dis¬ 

tribution  of  (£(ti,  co), . . . ,  £ (tn ,  co))  is  the  distribution  on  (R7" ,  %$Tn),  where  R  T'X  ~  YitETn  ^ 

and  93 Tn  =  YlteTn  3^-  Let  two  finite  subsets  Tr  and  Tn  of  T  be  given,  and  (R\  33r)  and  (R",  33") 

be  the  respective  subspaces  of  (Rr,  93r).  The  distributions  P and  P t"  on  (Rr,  930  and  (R",  93") 
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finite-dimensional  distributions  uniquely  specify  the  distribution  of  the  process 
on  the  space  (Mr,  93^).  That  theorem  can  be  considered  as  the  existence  theorem 
for  random  processes  in  (Mr,  93^)  with  prescribed  finite-dimensional  distributions. 

The  space  (Mr,  93^)  is,  however,  not  quite  convenient  for  studying  random  pro¬ 
cesses.  The  fact  is  that  by  no  means  all  relations  frequently  used  in  analysis  gener¬ 
ate  events,  i.e.  the  sets  which  belong  to  the  a -algebra  93^  and  whose  probabilities 
are  defined.  Based  on  the  definition,  we  can  be  sure  that  only  the  elements  of  the 
a -algebra  generated  by  {§(7)  e  B},  t  e  T,  B  being  Borel  sets,  are  events.  The  set 
{suprEr  §(7)  <  c},  for  instance,  does  not  have  to  be  an  event,  for  we  only  know  its 
representation  in  the  form  f~j/€r{§  (t)  <  c},  which  is  the  intersection  of  an  uncount¬ 
able  collection  of  measurable  sets  when  T  is  an  interval  on  the  real  line. 

Another  inconvenience  occurs  as  well:  the  distribution  on  (Mr,  93^)  does  not 
uniquely  specify  the  properties  of  the  trajectories  of  £(f).  The  reason  is  that  the 
space  Mr  is  very  rich,  and  if  we  know  that  jt(-)  belongs  to  a  set  of  the  form  (18.1.1), 
this  gives  us  no  information  about  the  behaviour  of  x(t)  at  points  t  different  from 
t\, ...  ,tn.  The  same  is  true  of  arbitrary  sets  A  from  93^:  roughly  speaking,  the 
relation  jc(-)  e  A  can  determine  the  values  of  x(t)  at  most  at  a  countable  set  of 
points.  We  will  see  below  that  even  such  a  set  as  {x(t)  =  0}  does  not  belong  to  93^. 
To  specify  the  behaviour  of  the  entire  trajectory  of  the  process,  it  is  not  sufficient  to 
give  a  distribution  on  93  J — one  has  to  extend  this  a -algebra. 

Prior  to  presenting  the  respective  example,  we  will  give  the  following  definition. 

Definition  18.1.3  Processes  §(t)  and  rj(t)  are  said  to  be  equivalent  (or  stochasti¬ 
cally  equivalent)  if  P(£(t)  =  rj  (t))  =  1  for  all  t  e  T .  In  this  case  the  process  q  is 
called  a  modification  of  § . 


Finite-dimensional  distributions  of  equivalent  process  clearly  coincide,  and 
therefore  the  distributions  P^  and  P,?  on  (Mr,  93^)  coincide,  too. 


Example  18.1.1  Put 


0  if  t  ^  a, 
1  if  t  =  a, 


and  complete  93 J  with  the  elements  xa(t),  a  e  [0,  1],  and  the  element  x°(t)  =  0. 
Let  y  ^  Uo,i.  Consider  two  random  processes  §o (0  and  (t)  defined  as  follows: 
§o(0  =x°(t ),  (t)  =xy(t).  Then  clearly 

PM<)  =  fi(0)=P(y^)  =  i, 

the  processes  §o  and  are  equivalent,  and  hence  their  distributions  on 
coincide.  However,  we  see  that  the  trajectories  of  the  processes  are  substantially 
different. 


are  said  to  be  consistent  if  their  projections  on  the  common  part  of  subspaces  W  and  R"  (if  it 
exists)  coincide. 
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It  is  easy  to  see  from  the  above  example  that  the  set  of  all  continuous  functions 
C(T ),  the  set  {suprEj-0  j]  x(t)  <  v},  the  one-point  set  {x(t)  =  0}  and  many  others 

do  not  belong  to  53^.  Indeed,  if  we  assume  the  contrary — say,  that  C(T )  g  53^ — 
then  we  would  get  from  the  equivalence  of  §o  and  that  P(§o  £  C(0,  1))  =  P(£i  G 
C(0,  1)),  while  the  former  of  these  probabilities  is  1  and  the  latter  is  0. 

The  simplest  way  of  overcoming  the  above  difficulties  and  inconveniences  is  to 
define  the  processes  in  the  spaces  C (T)  or  D(T )  when  it  is  possible.  If,  for  example, 
§(0  G  C(T)  and  rj(t)  G  C(T),  and  they  are  equivalent,  then  the  trajectories  of  the 
processes  will  completely  coincide  with  probability  1,  since  in  that  case 

H  {$(0  =  »7(0}  =  f|b(f)  =  »?(0}  =  {$(0  =  m  for  all  t  e  T), 

rational/  teT 

where  the  probability  of  the  event  on  the  left-hand  side  is  defined  (this  is  the  prob¬ 
ability  of  the  intersection  of  a  countable  collection  of  sets)  and  equals  1 .  Similarly, 
the  probabilities,  say,  of  the  events 

supt(f)  <  c]  =  n {£(*)<  C 

l'er  '  teT 

are  also  defined. 

The  same  argument  holds  for  the  spaces  D(T ),  because  each  element  v(-)  of  D 
is  uniquely  determined  by  its  values  x(t)  on  a  countable  everywhere  dense  set  of  t 
values  (for  example,  on  the  set  of  rationals). 

Now  assume  that  we  have  somehow  established  that  the  original  process  §(t)  (let 
it  be  given  on  (Mr,  53  J))  has  a  continuous  modification,  i.e.  an  equivalent  process 
rj(t)  such  that  its  trajectories  are  continuous  with  probability  1  (or  belong  to  the 
space  D(T)).  The  above  means,  first  of  all,  that  we  have  somehow  extended  the 
a -algebra  53^ — adding,  say,  the  set  C(T) — and  now  consider  the  distribution  of  § 

r~t~i  r~wi 

on  the  a -algebra  53 7  =  a (53^,  C(T))  (otherwise  the  above  would  not  make  sense). 

But  the  extension  of  the  distribution  of  §  from  (Mr,  53^)  to  (Mr,  53r)  may  not  be 
unique.  (We  saw  this  in  Example  18.1.1;  the  extension  can  be  given  by,  say,  putting 
P(§  G  C(T))  =  0.)  What  we  said  above  about  the  process  q  means  that  there  exists 
an  extension  P??  such  that  P^(C(T))  =  P (q  g  C(T))  =  1. 

Further,  it  is  often  better  not  to  deal  with  the  inconvenient  space  (Mr,  53  J)  at  all. 
To  avoid  it,  one  can  define  the  distribution  of  the  process  q  on  the  restricted  space 
(C(T),  53^).  It  is  clear  that 

rgTc  C23r  =or(Q5j,C(r)),  <8 l=^8TnC(T) 

(the  former  a -algebra  is  generated  by  sets  of  the  form  (18.1.1)  intersected  with 
C(T)).  Therefore,  considering  the  distribution  of  77  concentrated  on  C(T),  we  can 

_ _  m  ryi  rrr 

deal  with  the  restriction  of  the  space  (M  ,  53  )  to  (C(T),  53^)  and  define  the  proba¬ 
bility  on  the  latter  as  P^(A)  =  F(q  g  A),  A  g  53 Tc  C  53  r .  Thus  we  have  constructed 
a  process  q  with  continuous  trajectories  which  is  equivalent  to  the  original  process 
§  (if  we  consider  their  distributions  in 

To  realise  this  construction,  one  has  now  to  learn  how  to  find  from  the  distribution 
of  a  process  §  whether  it  has  a  continuous  modification  q  or  not. 
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Before  stating  and  proving  the  respective  theorems,  note  once  again  that  the 
above-mentioned  difficulties  are  mainly  of  a  mathematical  character ,  i.e.  related 
to  the  mathematical  model  of  the  random  process.  In  real  fife  problems,  it  is  usually 
clear  in  advance  whether  the  process  under  consideration  is  continuous  or  not.  If  it 
is  “physically”  continuous,  and  we  want  to  construct  an  adequate  model,  then,  of 
course,  of  all  modifications  of  the  process  we  have  to  take  the  continuous  one. 

The  same  argument  remains  valid  if,  instead  of  continuous  trajectories,  one  con¬ 
siders  trajectories  from  D(T).  The  problem  essentially  remains  the  same:  the  diffi¬ 
culties  are  eliminated  if  one  can  describe  the  entire  trajectory  of  the  process  §(•)  by 
the  values  § ( t )  on  some  countable  set  of  t  values.  Processes  possessing  this  property 
will  be  called  regular. 


18.2  Criteria  of  Regularity  of  Processes 


First  we  will  find  conditions  under  which  a  process  has  a  continuous  modification. 
Without  loss  of  generality,  we  will  assume  that  T  is  the  segment  T  =  [0,  1]. 

A  very  simple  criterion  for  the  existence  of  a  continuous  modification  is  based 
on  the  knowledge  of  two-dimensional  distributions  of  §(t)  only. 


Theorem  18.2.1  (Kolmogorov)  Let  §( t )  he  a  random  process  given  on  <Rr,2^> 
with  T  =  [0,  1].  If  there  exist  a  >  0,  b  >  0  and  c  <  oo  such  that,  for  all  t  and  t  -\-h 
from  the  segment  [0,  1], 


E  Ht+h)- £(/)  a<c\h\l+b, 


(18.2.1) 


then  §(•)  has  a  continuous  modification. 


We  will  obtain  this  assertion  as  a  consequence  of  a  more  general  theorem,  of 
which  the  conditions  are  somewhat  more  difficult  to  comprehend,  but  have  essen¬ 
tially  the  same  meaning  as  (18.2.1). 


Theorem  18.2.2  Let  for  all  t,  t  +  h  e  [0,  1], 

P(|$(f  +  A)-$(f)|  >£(*))  <*(*), 

where  s(h)  and  q(h)  are  decreasing  even  functions  ofh  such  that 


oo 


oo 


J2si2~n)  <o°’  J2r^(2~n)  <°°' 

77=  1  77  =  1 

Then  §(•)  has  a  continuous  modification. 


Proof  We  will  make  use  of  approximations  of  §(t)  by  continuous  processes.  Put 

tn,r  :=  r2~n ,  r  =0,  1,  ...,2n, 

£tz(  0  -=  ^  (ln,r  )  T  2  (t  tnr)^{tn,r- fl)  £(^77,t0]  for  t  £=  &n,  r  •>  ^77,r+l]- 
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Fig.  18.1  Illustration  to  the 
proof  of  Theorem  18.2.2: 
construction  of  piece-wise 
linear  approximations  to  the 
process  £  (f ) 


From  Fig.  18.1  we  see  that 

£/i+l(0 


< 


1 


^ Cn+l,2r+l)  2  h Cn+l,2. r)  “1“  ^ (^n+l,2r+2)] 


where  a  :=  (f„+l,2r+l)  -£fe+l,2r)l>  P  ■=  ltfe+l,2r+l)  -£0n+l,2r+2)l-  This  im¬ 

plies  that 

Z„\=  max  |£„+i(t)  -£„(r)|  <  ha  +  P), 

t^[tn,r  Tn,r+l]  -2 

P(Z„  >  e(2-"))  <  P(a  >  e(2-”))  +  P(/l  >  e(2_n))  <  2^(2"n) 

(note  that  since  the  trajectories  of  §n(f)  are  continuous,  {Zn  >  £  (2  n)}  e  2$^,  which 
is  not  the  case  in  the  general  situation).  Since  here  we  have  altogether  2n  segments 
of  the  form  \tnj,  tnj+ 1],  r  =  0,  1, . . . ,  2”  —  1,  one  has 


max 
t  €  [0, 1] 


In+1(0-In(0|  >e(2“"))  <2n+1(?(2-'!). 


Because  2wg(2  "”)  <  oo,  by  the  Borel-Cantelli  criterion,  for  almost  all  co  (i.e. 
for  wgA,  P(A)  =  1),  there  exists  an  n(co)  such  that,  for  all  n  >  n(<o), 

£n+l(0  -£n(0|  =  Pitn+Utn)  <  e(2_"). 


max 

fe[0,l] 


From  this  it  follows  that  is  a  Cauchy  sequence  a.s.,  since 


oo 


P  (£«  5  £m)  —  &n  ^  '  £(2  )  ^  0 


as  ti  — >  oo  for  all  m  >  n,  co  e  A.  Therefore,  for  co  e  A,  there  exists  the  limit 
77(f)  =  lim^oo  §w(f),  and  | i=n(t)  -  rj(t)\  <  en,  so  that  convergence  £*(f)  ->  77(f)  is 
uniform.  Together  with  continuity  of  £rt(f)  this  implies  that  77(f)  is  also  continuous 
(this  argument  actually  shows  that  the  space  C(0,  1)  is  complete). 

It  remains  to  verify  that  §  and  77  are  equivalent.  For  t  =  tnj  one  has  §w+&(f)  = 
§(f)  for  all  k  >  0,  so  that  77(f)  =  §(f).  If  f  ^  tnj  for  all  n  and  r,  then  there  exists  a 
sequence  such  that  ttJn  — >  f  and  0  <t  —  ttJn  <  2_,\  and  hence 


-  £(0|  >  e(?  -  fr.rj)  <  q(t  -  tt,rn), 
P(\Htt,r„)  ~  H0\  >  s(2~n))  <q(2~n). 


By  the  Borel-Cantelli  criterion  this  means  that  %njn  — >►  §  with  probability  1.  At 
the  same  time,  by  virtue  of  the  continuity  of  77(f)  one  has  rj(ttJn)  — >►  77(f).  Because 

we  have  §(f)  =  77(f)  with  probability  1. 

The  theorem  is  proved.  □ 
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Corollary  18.2.1  If 


E£(r  +  ft)-$(r) 


a 


< 


c\h\ 


log  |  A|| 


1  +b 


(18.2.2) 


for  some  b  >  a  >  0  and  c  <  oo,  then  the  conditions  of  Theorem  18.2.2  are  satisfied 
and  hence  %(t)  has  a  continuous  modification. 


Condition  (18.2.2)  will  certainly  be  satisfied  if  (18.2.1)  holds,  so  that  Kol¬ 
mogorov’s  theorem  is  a  consequence  of  Theorem  18.2.2. 

Proof  of  Corollary  18.2.1  Put  s(h)  :=  |log2  \h\\~P,  1  <  ft  <  b/a.  Then 


oo 


oo 


T2sh  n)=T2n  p  <o°’ 

n— 1 


n= 1 


and  from  Chebyshev’s  inequality  we  have 

c\h\ 


p(l?(/+»)-«t)|>e(*))<riog2|)i||1+I, 

where  8  =  b  —  a/3  >  0.  It  remains  to  note  that 


0 mya  = 


c\h\ 


l°g2  |A||1+^ 


=:q(h). 


OO 


OO 


y]2^(2-”)=^|log22 

n  =  1  n = 1 

The  corollary  is  proved. 


—n 


-1-8 


<  OO, 


□ 


The  criterion  for  §(t)  to  have  a  modification  belonging  to  the  space  D(T)  is  more 
complicated  to  formulate  and  prove,  and  is  related  to  weaker  conditions  imposed  on 
the  process.  We  confine  ourselves  here  to  simply  stating  the  following  assertion. 


Theorem  18.2.3  (Kolmogorov-Chentsov)  If  for  some  a  >  0,  ft  >  0,  b  >  0,  and  all 
t,  h\  <t  <1  —  /?2,  h\  >0,h2>  0, 


E§(r) -£(/-*,)“  i;(t  +  h2)-i;(t)fi\<chl+b,  h=hx+h2,  (18.2.3) 


then  there  exists  a  modification  of^(t)  in  D( 0,  1). 


Condition  (18.2.3)  admits  the  following  extension: 


P(|$(f  +  A2)-$(0 


>e(h))<q(h), 


(18.2.4) 


where  s(h)  and  q(h)  have  the  same  meaning  as  in  Theorem  18.2.2.  Under  condition 
(18.2.4)  the  assertion  of  the  theorem  remains  valid. 

The  following  two  examples  illustrate,  to  a  certain  extent,  the  character  of  the 
conditions  of  Theorems  18.2.1-18.2.3. 


4For  more  details,  see,  e.g.,  [9]. 
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Example  18.2.1  Assume  that  a  random  process  §  ( t )  has  the  form 


k= l 


where  <pk(t)  satisfy  the  Holder  condition 


<Pk(t+h)  -  <Pk(t ) 


<c\h\ 


a 


a  >  0,  and  (§i, . . . ,  §r)  is  an  arbitrary  random  vector  such  that  all  E| i=k\l  are  finite 
for  some  /  >  l /a.  Then  the  process  %(t )  (which  is  clearly  continuous)  satisfies  con¬ 
dition  (18.2.1).  Indeed, 


E  $(f+A)-$(0 


fc=i 


a/  >  1. 


Example  18.2.2  Let  y  Uq,i,  §(0  =  0  for  f  <  y,  and  §(f)  =  1  for  t  >  y.  Then 


P(y  G  (f,  f  T  /z)^  —  /z 


p 

for  any  l  >  0.  Here  condition  (18.2.1)  is  not  satisfied,  although  \%(t  +  h)  —  §(f)|  — >►  0 
as  /z  — >►  0.  Condition  (18.2.3)  is  clearly  met,  for 


Em-W-hi)  •  £(l  +  A2)-£(0  =0. 


(18.2.5) 


We  will  get  similar  results  if  we  take  §(f )  to  be  the  renewal  process  for  a  sequence 
yi,y2,...,  where  the  distribution  of  yj  has  a  density.  In  that  case,  instead  of  (18.2.5) 
one  will  obtain  the  relation 


E  §(0  —  £(*  —  h\)  •  £(t  +  A2)-£(0  <chih2<ck 


In  the  general  case,  when  we  do  not  have  data  for  constructing  modifications 
of  the  process  §  in  the  spaces  C(T)  or  D(T ),  one  can  overcome  the  difficulties 
mentioned  in  Sect.  18.1  with  the  help  of  the  notion  of  separability. 


Definition  18.2.1  A  process  §(f)  is  said  to  be  separable  if  there  exists  a  countable 
set  S  which  is  everywhere  dense  in  T  and 


p(  limsup  §(w)  >^(t)>  liminf  §(w)  for  all  t  e  r)  =  1.  (18.2.6) 

'  u  — >  t  ' 

ugS  ueS 


This  is  equivalent  to  the  property  that,  for  any  interval  I  C  T, 


It  is  known  (Doob’s  theorem  ’)  that  any  random  process  has  a  separable  modifi¬ 
cation. 


5  See  [14,  26]. 
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Constructing  a  separable  modification  of  a  process,  as  well  as  constructing  mod¬ 
ifications  in  spaces  C(T)  and  D(T ),  means  extending  the  a-algebra  $3^,  to  which 
one  adds  uncountable  intersections  of  the  form 


A  =  P|{§(«)e[a,6]}  = 


sup  §(w)  <  b , 

mg/ 


inf  §(w)  >  a 

mg/ 


and  extending  the  measure  P  to  the  extended  a  -algebra  using  the  equalities 


P(A)  =  p(  p|  &(u)e[a,b] Y 

\  ..  rn  c  / 


MG/H5 


where  in  the  probability  on  the  right-hand  side  we  already  have  an  element  of  33^. 

For  separable  processes,  such  sets  as  the  set  of  all  nondecreasing  functions,  the 
sets  C(T),  D(T)  and  so  on,  are  events.  Processes  from  C(T)  or  D(T)  are  automat¬ 
ically  separable.  And  vice  versa,  if  a  process  is  separable  and  admits  a  continuous 
modification  (modification  from  D(T ))  then  it  will  be  continuous  (belong  to  D(T )) 
itself.  Indeed,  if  rj  is  a  continuous  modification  of  §  then 

P(£(0  =  rj(t)  for  all*  €S)  =  1, 

From  this  and  (18.2.6)  we  obtain 

P(  limsup  r/(u)  >  Hit)  >  liminf  q(u)  for  all  /  g  T  )  =  1. 

'  m — >t  2 

mg5 

Since  limsupt/^  q(u)  =  liminfM^  q(u)  =  //(/),  one  has 

P(f  (f)  =  n{t)  for  all  f  e  T)  =  1. 

In  Example  18.1.1,  the  process  %\(t)  is  clearly  not  separable.  The  process  §o (0 
is  a  separable  modification  of  §i(t). 

As  well  as  pathwise  continuity,  there  is  one  more  way  of  characterising  the  con¬ 
tinuity  of  a  random  process. 


Definition  18.2.2  A  random  process  §(/)  is  said  to  be  stochastically  continuous  if, 
for  all  /  e  r,  as  h  — >  0, 


£(r  +  fc)4-£(r) 


(P(|$(M-/0-$(0 


Here  we  deal  with  the  two-dimensional  distributions  of  §(/)  only. 

It  is  clear  that  all  processes  with  continuous  trajectories  are  stochastically  con¬ 
tinuous.  But  not  only  them.  The  discontinuous  processes  from  Examples  18.1.1 
and  18.2.2  are  also  stochastically  continuous.  A  discontinuous  process  is  not 
stochastically  continuous  if,  for  a  (random)  discontinuity  point  r  (§(r  +  0)  ^ 
§(r  —  0)),  the  probability  P(r  =  to)  is  positive  for  some  fixed  point  to. 


Definition  18.2.3  A  process  §  (/)  is  said  to  be  continuous  in  mean  of  order  r  (in 
mean  when  r  =  1 ;  in  mean  quadratic  when  r  =  2)  if,  for  all  t  e  T,  as  h  —>  0, 


Ht  +  h)^Xm 


or,  which  is  the  same, 


E  $(f+/i)-$(0 


0. 
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The  discontinuous  process  §(f)  from  Example  18.2.2  is  continuous  in  mean  of 
any  order.  Therefore  the  continuity  in  mean  and  stochastic  continuity  do  not  say 
much  about  the  pathwise  properties  (they  only  say  that  a  jump  in  a  neighbour¬ 
hood  of  any  fixed  point  t  is  unlikely).  As  Kolmogorov’s  theorem  shows,  in  or¬ 
der  to  characterise  the  properties  of  trajectories,  one  needs  quantitative  bounds  for 
E| $(r  +  h)  -  m\r  or  for  P(| $(t  +  h)  -  $(t) |  >  e). 

Continuity  theorems  for  moments  imply  that,  for  a  stochastically  continuous  pro¬ 
cess  §(f)  and  any  continuous  bounded  function  g(x),  the  function  Eg(^(t))  is  con¬ 
tinuous.  This  assertion  remains  valid  if  we  replace  the  boundedness  of  g(v)  with  the 
condition  that 

suPE|^(o)r  <  oo  for  some  a  >  1 . 

t 

The  consequent  Chaps.  19,  21  and  22  will  be  devoted  to  studying  random  pro¬ 
cesses  which  can  be  given  by  specifying  the  explicit  form  of  their  finite-dimensional 
distributions.  To  this  class  belong: 

1.  Processes  with  independent  increments. 

2.  Markov  processes. 

3.  Gaussian  processes. 

In  Chap.  22  we  will  also  consider  some  problems  of  the  theory  of  processes  with 
finite  second  moments.  Chapter  20  contains  limit  theorems  for  random  processes 
generated  by  partial  sums  of  independent  random  variables. 


Chapter  19 

Processes  with  Independent  Increments 


Abstract  Section  19.1  introduces  the  fundamental  concept  of  infinitely  divisible 
distributions  and  contains  the  key  theorem  on  relationship  of  such  processes  to 
processes  with  independent  homogeneous  increments.  Section  19.2  begins  with  a 
definition  of  the  Wiener  process  based  on  its  finite-dimensional  distributions  and 
establishes  existence  of  a  continuous  modification  of  the  process.  It  also  derives  the 
distribution  of  the  maximum  of  the  Wiener  process  on  a  finite  interval.  The  Laws 
of  the  Iterated  Logarithm  for  the  Wiener  process  are  established  in  Sect.  19.3.  Sec¬ 
tion  19.4  is  devoted  to  the  Poisson  processes,  while  Sect.  19.5  presents  a  character¬ 
isation  of  the  class  of  processes  with  independent  increments  (the  Levy-Khintchin 
theorem). 


19.1  General  Properties 

Definition  19.1.1  A  process  {§(/),  t  e  [a,b]}  given  on  the  interval  [a,b\  is  said 
to  be  a  process  with  independent  increments  if,  for  any  n  and  to  <  t\  <  •  •  •  <  tn , 
a  <  to ,  tn  <  b ,  the  random  variables  §(Jo),  §(fi)  —  §(Jo)>  ...,§(*«)  —  %(tn- 1)  are 
independent. 

A  process  with  independent  increments  is  called  homogeneous  if  the  distribu¬ 
tion  of  §(ti)  —  §(to)  is  determined  by  the  length  of  the  interval  t\  —  to  only  and  is 
independent  of  Jo¬ 
in  what  follows,  we  will  everywhere  assume  for  simplicity’s  sake  that  a  =  0, 
£(0)  =  0  and  b  =  1  or  b  =  oo. 


Definition  19.1.2  The  distribution  of  a  random  variable  §  is  called  infinitely  di¬ 
visible  (cf.  Sect.  8.8)  if,  for  any  n,  the  variable  §  can  be  represented  as  a  sum  of 

independent  identically  distributed  random  variables:  §  =  £i>w  H - +  £w>w.  If  <p(k) 

is  the  ch.f.  of  §,  then  this  is  equivalent  to  the  property  that  cp l^n  is  a  ch.f.  for  any  n. 

It  is  clear  from  the  above  definitions  that,  for  a  homogeneous  process  with 
independent  increments,  the  distribution  of  §(J)  is  infinitely  divisible,  because 

§  =  ,n  H - f  &!,/!,  where  %k,n  =i=(kt/n)  -£((£-  1  )t/n)  are  independent  and 

distributed  as  ^(t/n). 
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Theorem  19.1.1 

(1)  Let  m)  5  t  ^  0}  be  a  stochastically  continuous  homogeneous  process  with  in~ 
dependent  increments ,  and  let  <ft(X)  =  EelX^^  be  the  ch.f.  of  §(7),  (p(X)  := 
cpi(X).  Then 

<pt(k)  =  <p\k),  (19.1.1) 

<p(X)  7^  0/or  any  X. 

(2)  ^>(A)  be  the  ch.f  of  an  infinitely  divisible  distribution.  Then  there  exists  a 
random  process  {§(0  ,  t  >  0}  satisfying  the  conditions  of  ( 1)  and  such  that 

E  eixm  =(p{X). 


Note  that  in  the  theorem  the  power  cp{  (X)  of  the  complex  number  cp( X)  is  under¬ 
stood  as  |<p(A)|r e101^ ,  where  a  (A)  =  arg^(A)  (cp(X)  =  \(p(X)\eia^).  But  a  (A)  is  a 
multi-valued  function,  which  is  defined  up  to  the  term  Ink  with  integer  k.  There¬ 
fore,  for  non-integer  t,  the  function  qf  { A)  will  be  multi-valued  as  well.  Since  any 
ch.f.  is  continuous,  after  crossing  the  level  Ink  by  a (A)  (while  changing  the  value  of 
A  from  zero,  o/O)  =  0),  we  are  to  take  the  “nearest”  branch  of  a  (A)  so  as  to  ensure 
continuity  of  the  function  (pf  (X).  For  example,  for  the  degenerate  distribution  Ii  we 
have  <p( A)  =  elX  (a (A)  =  A),  so  for  small  t  >  0,  s  >  0  and  for  A  =  27r  +  s  we  are  to 
set  cp*  ( A)  =  el^+e)t  rather  than  qf  { A)  =  elst  (although  cp( A)  =  els ). 

Denote  by  £  the  class  of  ch.f.s  of  all  infinitely  divisible  distributions  and  by 
L  \  the  class  of  the  ch.f.s  of  the  distributions  of  §(f)  for  stochastically  continuous 
homogeneous  processes  with  independent  increments.  Then  it  follows  from  Theo¬ 
rem  19.1.1  that  £  =  £i.  The  class  £  will  be  characterised  in  Sect.  19.5. 

Proof  (1)  Let  §(f)  satisfy  the  conditions  of  part  (1)  of  the  theorem.  Then  £(f)  can 
be  represented  as  a  sum  of  independent  increments 

n 

£(0  =  -  Htj- 1)],  to  =  0,  tn  =  t,  tj  >  tj- 1. 

j= i 

From  this  it  follows,  in  particular,  that  for  tj  =  j/n,  t  =  1, 

<p(X)  =  [<Pl/n(A.)]",  (pi/n(X)  =  <pl/n(X). 

Raising  both  sides  of  the  last  equality  to  an  integer  power  A,  we  obtain  that,  for  any 
rational  r  =  k/n,  one  has 

<Pk/nW  =  (pk,n(X), 

which  proves  (19.1.1)  for  t  =  r.  Now  let  t  be  irrational  and  rn  :=  \  tn\/n.  Since  §(f) 

p 

is  a  stochastically  continuous  process,  one  has  %(rn)  — >  §(0  as  n  — >►  oo,  and  hence 
the  corresponding  ch.f.s  converge:  for  any  A, 

<Prn(X)  <pt( X). 

But  <pr„(X )  =  <4>r" (X)  — ►  (p1  (a).  Therefore  (19.1.1)  necessarily  holds  true. 
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Further,  by  stochastic  continuity  of  §(•),  we  have  (pt(X)  =  (pf(X)  — >  1  as  t  — >►  0 
for  any  A.  This  implies  that  49(A)  7^  0  for  any  A.  This  completes  the  proof  of  the  first 
assertion  of  the  theorem. 

(2)  Observe  first  that  if  cp  e  L  then,  for  any  t  >  0,  cpf  is  again  a  ch.f.  Indeed, 

lim  (P[tni/n(k), 

n^oo 

so  that  (pf  (A)  is  a  limit  of  ch.f.s  which  is  continuous  at  the  point  X  =  0.  By  the 
continuity  theorem  for  ch.f.s,  this  is  again  a  ch.f. 

Now  we  will  construct  a  random  process  §(Y)  with  independent  increments  by 
specifying  its  finite-dimensional  distributions.  Put 


0  =  t0  <  h  <  •  •  •  <  tk,  Aj  :=  %(tj)  -  %(tj- 1),  Sj  :=  tj  -  tj- 1, 
and  observe  that 


k  k  j  k  k 

=  E^/  I>  =  E^  E^- 

7  =  1  7  =  1  /=1  7  =  1  *=7 


Define  the  ch.f.  of  the  joint  distribution  of  §(G ),...,  §  (4)  by  the  equality  (postulat¬ 
ing  independence  of  Aj) 


Eexp 


i  EWi) 


Eexp 


A:  7 

*E^E^ 

7=1  i=.i 


k 


id 

7=1 


Thus,  we  have  used  49  to  define  the  finite-dimensional  distributions  of  §(7)  in 
(Mr,93^)  with  T  =  [0,  00)  which,  as  one  can  easily  see,  are  consistent.  By 
Kolmogorov’s  theorem,  there  exists  a  distribution  of  a  random  process  §(7)  in 
(Mr,  93j^).  That  process  is  by  definition  a  homogeneous  processes  with  indepen¬ 
dent  increments. 

To  prove  stochastic  continuity  of  §(7),  note  that,  as  h  ->  0, 

EeiA.(|(t+A)-|(t))  =  ^(X), 


where 


<P0&)  = 


1  if  (p(X)  7^  0, 
0  if  49(A)  =  0. 


Thus  the  limiting  function  499(A)  can  assume  only  two  values:  0  and  1.  But  it  is 
bound  to  be  a  ch.f.  since  it  is  continuous  at  the  point  A  =  0  (<p  (A)  7^  0  in  a  neigh¬ 
bourhood  of  the  point  A  =  0)  and  is  a  limit  of  ch.f.s.  Therefore  490(A)  is  continuous, 
499(A)  =  1 ,  <^/2 (A)  — >  1,  and 


§0  +  h)  -£(r)  4  0  as/z^O. 


The  theorem  is  proved. 


□ 


Corollary  19.1.1  Let  the  conditions  of  part  (1)  of  Theorem  19.1.1  he  met .  If  for 
all  t ,  E|£(f)|  <  00  then 


E$(0  =  *E$(1). 
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T/‘E(§(1))2  <  oo  then 

Var£(f)  =  fVar£(l). 

Proof  For  the  sake  of  brevity,  put  a  :=  E£(l).  Then,  differentiating  (19.1.1)  in  A  at 
the  point  A  =  0,  we  obtain 

E  §(f)  =  —i(p't(  0)  =  —  it(pr~l(pr  (f))  =at, 

E S\t)  =  =  -t{t  -  l)</3;_2(0)(</(0))2  -  Upf~l iO)(p" (0) 

=  t(t-  l)a2  +  /Ef  2(1), 

Varf  (t)  =  t (E£2(l)  -  a2)  =  t  Var£(l). 

The  corollary  is  proved.  □ 

In  the  next  theorem  we  put,  as  before,  T  =  [0,  1]  or  T  =  [0,  oo). 

Theorem  19.1.2  Homogeneous  stochastically  continuous  processes  with  indepen¬ 
dent  increments  {§(0>  t  €  T]  have  modifications  in  the  space  D{T ),  i.e.  the  process 
%(t)  can  be  given  in  ( D(T ),  93^)  and  hence  have  no  discontinuities  of  the  second 
type. 

Proof  To  simplify  the  argument,  assume  that  E§2(1)  exists,  or,  which  is  the  same, 
that  the  second  derivative  <p"(A)  exists.  Then 

E($(f)  -  Hit  -  h)f  =  0)  =  -h(h  -  1)[V(0)]2  -  hcp"( 0)  <  c\h\, 

E(|§(f  +  hi)  -  m\2\m  -  Hit  -hi)  I2)  <  c2hxh2  <  c2(hi  +  h2), 

and  the  assertion  follows  from  the  second  criterion  of  Theorem  18.2.3.  The  theorem 
is  proved.  □ 

In  the  general  case,  the  proof  is  more  complicated:  one  has  to  make  use  of  crite¬ 
rion  (18.2.4)  and  bounds  for  P(§(0  —  %(t  —  h)\  >  e). 

Now  we  will  consider  the  two  most  important  processes  with  independent  incre¬ 
ments:  the  so-called  Wiener  and  Poisson  processes. 


19.2  Wiener  Processes.  The  Properties  of  Trajectories 

Definition  19.2.1  The  Wiener  process  is  a  homogeneous  process  with  independent 
increments  for  which  the  distribution  of  §(1)  is  normal. 

In  other  words,  this  is  a  process  for  which 

<p(k)  =  eiXa-alxll 2,  <p,(X)  =  cpHk)  =  eixta-°2xh  !2 
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for  some  a  and  a  2>o.  The  second  equality  means  that  the  increments  §(7  +  u)  — 
§(w)  are  normally  distributed  with  parameters  ( at,cr2t ).  All  joint  distributions  of 
§(fi), . . . ,  §(fw)  are  clearly  also  normal. 

The  numbers  a  and  a  are  called  the  shift  and  diffusion  coefficients ,  respectively. 
Introducing  the  process  $o (0  •=  ($(0  ~  at) /cf  which  is  obtained  from  §(f)  by  an 
affine  transformation,  we  obtain  that  its  ch.f.  equals 

Ee^foW  =  e-iXat/a(pt{X/a)  =  e~xhl2. 

Such  a  process  with  parameters  (0,  t)  is  often  called  the  standard  Wiener  process. 
We  consider  it  in  more  detail. 

Theorem  19.2.1  The  Wiener  process  has  a  continuous  modification. 

This  means,  as  we  know,  that  the  Wiener  process  {§ (t),  t  e  [0,  1]}  can  be  consid¬ 
ered  as  given  on  the  measurable  space  (C(0,  1),  of  continuous  functions. 

Proof  We  have  §(7  +  h)  —$(t)  ^  4>o,h  and  h~lj/2($(t  -\-h)  —  $(t))  4>o,i.  Therefore 

E ($(t  +h)~  % (0)4  =  h2 Ef  (l)4  =  3 h2. 

This  means  that  the  conditions  of  Theorem  18.2.1  are  satisfied.  □ 


Thus  we  can  assume  that  §(•)  e  C(0,  1).  The  standard  Wiener  process  with  con¬ 
tinuous  trajectories  will  be  denoted  by  {w(t),  t  e  T}. 

Now  note  that  the  trajectories  of  the  Wiener  process  w(t),  being  continuous ,  are 
not  differentiable  with  probability  1  at  any  given  point  t . 

By  virtue  of  the  homogeneity  of  the  process,  it  suffices  to  prove  its  nondifferen¬ 
tiability  at  the  point  0.  If,  with  a  positive  probability,  i.e.  on  an  event  set  A  C  T2  with 
P(A)  >  0,  there  existed  the  derivative 

w  ( 0)  =  w  ( 0,  co)  =  lim - , 

o  t 

then,  on  the  same  event,  there  would  exist  the  limit 


lim 

k^oo 


w(2~k+l)  -  w(2~k) 
2rk 


lim 

k—>Q> o 


2w(2~k+l) 

2~k+l 


lim 

k — >  oo 


w(2~k) 

2“* 


2w'{ 0)  -  u/(0)  =  u/(0). 


But  this  is  impossible  for  the  following  reason.  The  independent  differences 
w(2~k+l)  —  w( 2~k)  have  the  same  distribution  as  w( 2~k),  and  with  the  positive 
probability  p  =  l  —  0(1)  they  exceed  the  value  \j2~k .  That  is,  the  independent 

events  B f  =  {w(2~k+l)  —  w( 2~k)  >  s/2~k)  have  the  property  YlkLi  =  °°- 

By  the  Borel-Cantelli  criterion,  this  means  that  with  probability  1  there  occur  in¬ 
finitely  many  events  B k,  so  that 


/  .  w(2~k+l)  -  w( 2~k) 

P I  lim  sup - - - >  1 
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In  the  same  way  we  find  that 


(  w( 2~k+l)  -  w( 2~k) 

P  liminf— - - - - -  <  -1  I  =  1. 


\  k — >-  OO 


2~k 


This  implies  that,  with  probability  1 , 

w( 2~k+l)  -  w(2~k) 
limsup - —r - =  oo, 


.  fw(2-k+l)-w(2~k) 
lim  ml - 7 - 

k — >oo  2~k 


—  OO, 


and  therefore  the  process  w(t)  is  nondifferentiable  at  any  given  point  t  with  proba¬ 
bility  1. 

A  stronger  assertion  also  takes  place:  with  probability  1  there  exists  no  point  t 
at  which  the  trajectory  of  the  process  w(t)  would  have  a  derivative.  In  other  words, 
the  Wiener  process  is  nowhere  differentiable  with  probability  1 .  The  proof  of  this 
fact  is  much  more  complicated  and  lies  beyond  the  scope  of  the  book. 

The  reader  can  easily  verify  that  w(t)  has,  in  a  certain  sense,  a  parabola  property. 
Namely,  for  any  c  >  0,  the  process  w*(t)  =  c  l^2w(ct)  is  again  a  Wiener  process. 

The  properties  of  continuity  of  trajectories  and  independence  of  increments  for 
the  Wiener  process  allow  us  to  find,  in  an  explicit  form,  the  distributions  of 


w(t)  =  max  w(ii) 

U  €  [0 ,  t  ] 

and  of  the  time  of  the  first  passage  of  a  given  level  which  is  defined,  for  a  given 
v  >  0,  by 

r](x)  :=  inf :  w(t)  >  x }  =  inf :  w(t)  =  x  | . 


Theorem  19.2.2 


P(uF(0  >  x)  =  2P(w(0  >x)=  2^1 
The  distribution  of  rj{\)  is  stable  and  has  the  density 


-  0 


X 


St. 


(19.2.1) 


1 


Sjx  t3/2 


e  2t ,  t  >  0. 


(19.2.2) 


Distribution  (19.2.1)  is  sometimes  called  the  double  normal  tail  law ,  while  the 
distribution  with  density  (19.2.2)  is  called  the  Levy  distribution  (see  Sect.  8.8). 

Proof  Since 

oo 

{ )] (x )  =  l’ }  =  {w(v  —  l/n)  <  x,w(v)  =  v }  e$v  :=  cr {w(u);  u  <  v } 

n= 1 

and  w(t)  —  w(v)  =  w(t  —  v)  for  t  >  v  does  not  depend  on  $v,  we  have 

P(u;(t)  >  x)  =  f  P(t7(v)  e  dv)V{w(t  —  v)  >  0) 

Jo 

=  1  f  P (n(x)  edv)  =  \  PHO  -  x)- 

This  implies  the  first  assertion  of  the  theorem. 
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The  same  equalities  imply  that 


F(ri(x)  <  v 


which  yields,  for  the  density  fn  of  the  variable  77  :=  77(1), 


In  order  to  prove  that  this  distribution  is  stable,  note  that 

rj{n)  =  77H - \-rjn, 

where  77/  are  distributed  as  77  and  are  independent  (since  the  path  of  w(t)  first  attains 
level  1;  then  level  2,  starting  at  a  point  with  ordinate  1;  then  level  3,  and  so  on). 
Using  the  same  argument  as  above,  we  obtain  that 

P(77 (/i)  <  v)  =  P (w(v)  >  n)  =  P (w(vn~2)  >  1 )  =  F(r]  <  vn~ 2), 

so  the  distributions  of  77  and  77(77)  coincide  up  to  a  scale  transformation.  This  implies 

the  stability  of  the  distribution  of  77  (see  Sect.  8.8).  Since  77  >  0  and  P(t7  >  x)  ~  J 
as  v  — >  00,  we  obtain  that  it  is,  up  to  a  scale  transformation,  the  distribution  F  1/2,1 
with  parameters  /3  =  1/2,  p  =  1  (cf.  Sect.  8.8).  The  theorem  is  proved.  □ 


19.3  The  Laws  of  the  Iterated  Logarithm 

Using  an  argument  similar  to  that  employed  at  the  end  of  the  previous  section,  one 
can  establish  a  much  stronger  assertion:  the  trajectory  of  w(t)  in  the  neighbourhood 
of  the  point  t  —  0,  graphically  speaking,  “completely  shades”  the  interior  of  the 
domain  bounded  by  the  two  curves 


The  exterior  of  this  domain  remains  untouched.  This  is  the  so-called  law  of  the 
iterated  logarithm. 


Theorem  19.3.1 
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Thus,  if  we  consider  the  sequence  of  random  variables  w(tn ),  tn  |  0,  then,  for 
any  s  >  0, 


will  be  upper  and  lower  sequences,  respectively,  for  that  sequence. 

For  processes,  we  could  introduce  in  a  natural  way  the  notions  of  upper  and 
lower  functions.  If,  for  example,  a  process  %(t)  belongs  to  C(0,  oo)  or  D(  0,  oo)  (or 
is  separable  on  (0,  oo)),  then  the  respective  definition  for  the  case  t  — >  oo  has  the 
following  form. 

Definition  19.3.1  A  function  a  it)  is  said  to  be  upper  (lower)  for  the  process  §(f) 
if,  for  some  sequence  tn  f  oo,  the  events  An  =  {supM>f  (§(t)  —  a(t))  >  0}  occur 
finitely  (infinitely)  often  with  probability  1 . 

Along  with  Theorem  19.3.1,  we  will  obtain  here  the  conventional  law  of  the 
iterated  logarithm.  The  proofs  of  the  both  assertions  are  essentially  identical.  We 
will  prove  the  latter  and  derive  the  former  as  a  consequence. 

Theorem  19.3.2  (The  Law  of  the  Iterated  Logarithm) 


PI  liminf 


>-0 O 


Thus,  for  any  e  >  0,  the  functions  (1  ±  e)\j2t  In  In  t  are,  respectively,  upper  and 
lower  for  w(t)  as  t  — >  oo. 


Proof  of  Theorem  19.3.2  First  observe  that,  by  L’ Hospital’s  rule, 


(19.3.1) 


Let  a  >  1  and  Xk  :=  s/l ak  In  In  ak.  We  have  to  show  that,  for  any  e  >  0, 


(19.3.2) 


i.e.  that,  with  probability  1,  for  all  sufficiently  large  t. 


w(t)  <  (1  +  E)\ht  lnlnt. 


19.3  The  Laws  of  the  Iterated  Logarithm 


547 


Fig.  19.1  Illustration  to  the 
proof  of  Theorem  19.3.2: 
replacing  the  curvilinear 
boundary  with  a  step  function 


To  this  end  it  suffices  to  establish  that,  with  probability  1 ,  there  occur  only  finitely 
many  events 

Bk  :=  I  sup  w(u)  >  (1  +  e)xk- 1 1. 

ak~l  <u<ak 

Consider  the  events 

Ak  =  |  sup  w(u)  >  (1  +  s)xk- 1 J  D  Bk 

1  u<ak 


(see  Fig.  19.1).  Because  Xk 


oo  as 


oo,  by  Theorem  19.2.2  one  has 


P(-Afc)  =  2P(w(ak)  >  (1  +  s)Xk-\) 


Va 


Tt  (1  +8)Xk- 1 

1 


exp 


2(1  +  s)ak  1  Inina 


,k- 1 


2  ak 


1 


1 


(1  +  e)  V  Training  1  (In ak~l)^lJr£^2 / a 

1 

—  £  ) _ 

V(ln(£-l)+lnlna)(£  -  1  )d+*)2/« 

Put  a  :=  1  +  s  >  1.  Then  clearly 


P(AyO  ~ 


c(e) 

kl+£^/\nk 


as  k  — >►  oo. 

In  the  above  formulas,  c(a,  s)  and  c(s )  are  some  constants  depending  on  the  in¬ 
dicated  parameters.  The  obtained  relation  implies  that  Ylb=i  <  oo  and  hence 

J2kL\  P(^)  <  20  (for  Bk  C  Ak),  so  that  by  the  Borel-Cantelli  criterion  (Theo¬ 
rem  11.1.1)  with  probability  1  the  events  Bk  occur  only  finitely  often. 

We  now  prove  that,  for  an  arbitrary  s  >  0, 


w(t) 


P I  lim  sup  _ 

t^oo  V2flnlnf 


>  1  -e  \  =  1. 


(19.3.3) 


It  is  evident  that,  together  with  (19.3.2),  this  will  mean  that  the  first  assertion  of  the 
theorem  is  true. 
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Consider  for  a  >  1  independent  increments  w(ak)  —  w(ak  !)  and  denote  by  B k 
the  event 

Bk  :=  [w(ak)  —  w(ak~l )  >  (1  —  s/2)rxk\. 

Since  w(ak)  —  w(ak~l)  is  distributed  as  w(ak(  1  —  a~ 1)),  by  virtue  of  (19.3.1)  we 
find,  as  before,  that 

(1  —  £/2)22ak  lnln a~k 
2ak{\  —  a-1) 

^  ci(a,g)  (i_£/2)2/(1_a-i) 

yna 

This  implies  that,  for  a  >  2/e,  the  series  YltLi  P(#fc)  diverges,  and  hence  by  the 
Borel-Cantelli  criterion  the  events  B k  occur  infinitely  often,  with  probability  1. 

Further,  by  the  symmetry  of  the  process  w(t),  it  follows  from  relation  (19.3.2) 
that,  for  all  k  large  enough  and  any  8  >  0, 

w(ak)  >  —(1  +  8)xk- 

Together  with  the  preceding  argument  this  shows  that  the  event 

w(ak~l)  +  [ w(ak )  —  w(ak~ly\  =  w(ak )  >  —(1  +  8)xk- 1  +  (1  —  s/2 )xk 

will  occur  infinitely  often.  But  the  right  hand-side  of  the  above  inequality  can  be 
made  greater  than  (1  —  s)xk  by  choosing  an  appropriate  a.  Indeed, 

—  (1  +  8)xk~ i  +  —Xk  >  0 

once 


P  {Bk) 


\f2n(\  —  s/2)xk 


exp 


(1+5)- 


lnln  ak  1 
a  In  In  ak 


< 


£ 

2’ 


which,  in  turn,  can  easily  be  achieved  by  taking  a  large  enough.  Thus  relation 
(19.3.3)  is  proved. 

The  second  assertion  of  the  theorem  clearly  follows  from  the  first  by  virtue  of  the 
symmetry  of  the  distribution  of  w(t).  □ 


Now  we  can  obtain  as  a  consequence  the  local  law  of  the  iterated  logarithm  for 
the  case  where  t  0. 


Proof  of  Theorem  19.3.1  Consider  the  process  {W  (u)  :=  uw  (l /u),  u  >0},  where 
we  put  fk(O)  :=  0.  The  remarkable  fact  is  that  the  process  {W(u),  u  >0}  is  also  the 
standard  Wiener  process.  Indeed,  for  t  >  u, 


Eexp  [ik[W(t)  —  W(u))]  =Eexp 


ik 


1 


1 


=  Eexpl  ik 


tw  I  -  I  —  u w  I  — 

u 


1 


1 


1 


W  I  -  I  (t  —  Li)  —  Li  I  U)  I  —  I  —  W  (  - 


U 
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=  exp- 


=  exp 


X1 


u)2  - 
t 


2„2 


Xu 


1 

u 


1 

t 


Xz 

- (t  —  u) 


The  independence  of  increments  is  easiest  to  prove  by  establishing  their  noncor- 
relatedness.  Indeed, 


E[W(u)(W(t)  -  W(u))]=E 


=  E 


uxv I  A  | ' tw 

1\  (\ 

UW |  -  I  1  W  I  - 


—  UW  |  — 
U 

2  2/1 

w  w  I  — 

U 


=  U  —  U  =  0. 


To  complete  the  proof  of  the  theorem,  it  remains  to  observe  that 

w(t)  uw(l/u)  W(u) 


lim  sup  _ 

t^o o  V2tlnlnt 

The  theorem  is  proved. 


uw(\/u) 

=  lim  sup  =  =  lim  sup 

A/2wlnln2 
y  w 


Jlu  lnln  2 

y  w 


□ 


We  could  also  prove  the  theorem  by  repeating  the  argument  from  the  proof  of 
Theorem  19.3.2  with  a  <  1. 

In  conclusion  we  note  that  Wiener  processes  play  an  important  role  in  many 
theoretical  probabilistic  considerations  and  serve  as  models  for  describing  various 
real-life  processes.  For  example,  they  provide  a  good  model  for  the  movement  of 
a  diffusing  particle.  In  this  connection,  the  Wiener  processes  are  also  often  called 
Brownian  motion  processes. 

Wiener  processes  prove  to  be,  in  a  certain  sense,  the  limiting  processes  for  ran¬ 
dom  polygons  constructed  on  the  vertices  ( k/n ,  Sk/y/ii),  where  Sk  are  sums  of  ran¬ 
dom  variables  %j  with  E^-  =  0  and  Var (§y)  =  1.  We  will  discuss  this  in  more  detail 
in  Chap.  20.  The  concept  of  the  stochastic  integral  and  many  other  constructions 
and  results  are  also  closely  related  to  the  Wiener  process. 


19.4  The  Poisson  Process 

Definition  19.4.1  A  homogeneous  process  §  (t)  with  independent  increments  is  said 
to  be  the  Poisson  process  if  §(*)  —  £  (0)  has  the  Poisson  distribution. 

For  simplicity’s  sake  put  §(0)  =  0.  If  §(1)  77^,  then 

(p{X)  \=  JLe1^^  =  exp {/i(elX  —  l)} 

(Pt(X)  =  EelX^^  =  cp{  (X)  =  exp  {pt(elX  —  l) } , 


and,  as  we  know, 
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so  that  §( t )  ^  We  consider  the  properties  of  the  Poisson  process.  First  of  all, 

for  each  t ,  §(7)  takes  only  integer  values  0,  1,  2, ... .  Divide  the  interval  [0,  t)  into 
segments  [0,  t\),  [t\,  tfi),  ••• ,  Un-lJn)  of  lengths  At  =  tt  -  i,  i  =  1, . . . ,  n.  For 
small  zA;  the  distributions  of  the  increments  §(/;)  —  ^(0-i)  will  have  the  property 
that 

P($(0  -  =  0)  =  P(§(4,-)  =  0)  =  =  1  -  Mi  +  0(4), 

P(| (0  -  £(fi-i)  =  1)  =  Mi«_M'  =  liA,  +  0(4),  (19.4.1) 

P(^(0-^«-i)>  2)  =  0(4). 

Consider  “embedded”  rational  partitions  !ft(ft)  =  [t\, . . . ,  tn)  of  the  interval  [0,  t] 
such  that  Jl(n)  C  Jl(n  +  1)  and  |J  Jl(n)  =  %\  is  the  set  of  all  rationals  in  [0,  t]. 

Note  the  following  three  properties. 

(1)  Let  v(n)  be  the  number  of  intervals  in  the  partition  Jl(n)  on  which  the  incre¬ 
ments  of  the  process  §  are  non-zero.  For  each  go,  v(n)  is  non-decreasing  as  n  — >  oo. 
Furthermore,  the  number  v{n)  can  be  represented  as  a  sum  of  independent  random 
variables  which  are  equal  to  1  if  there  is  an  increment  on  the  i- th  interval  and  0 
otherwise.  Therefore,  by  (19.4.1) 

P(v(n)^£(0)=P(  1J  {£(*,•)  -£(*i-i)>  2}  u{£(f)-$(f„)>l} 

=  o  ^  a2j  +  (t  — 1„ ) 
where  Y^]= l  A2-  <t  max  Aj  — >►  0  as  n  — >  oo,  so  that  a.s. 

i/ 

as  the  partitions  refine. 

(2)  Because  the  maximum  length  of  the  intervals  Aj  tends  to  0  as  n  — >►  oo,  the 
total  length  of  the  intervals  containing  jumps  converges  to  0. 

Therefore,  taking  the  unions  of  the  remaining  adjacent  intervals  Aj  (i.e.  where 
there  are  no  increments  of  § ),  for  each  co  we  obtain  in  the  limit,  as  n  — >  oo,  §(7)  +  1 
intervals  (0,  73),  (7j,  Ti), . . . ,  (Tv,  7)  on  which  the  increments  of  §  are  null. 

(3)  Finally,  by  (19.4.1)  the  probability  that  at  least  one  of  the  increments  on  the 
intervals  Aj  exceeds  one  is  J2j  0(A2)  =  o(  1)  as  n  — >  00,  so  that,  with  probabil¬ 
ity  1,  the  jumps  at  the  points  Tk  are  equal  to  1. 

Thus  we  have  shown  that,  on  the  segment  [0,7],  for  each  go  there  exists  a  finite 
number  §(7)  of  points  T\ , . . . ,  T^t)  such  that  §(w)  takes  at  the  rational  points  of  the 
intervals  (7&,  7&+i)  one  and  the  same  constant  value  equal  to  k.  This  means  that  one 
can  extend  the  trajectories  of  the  process  §(w),  say,  by  continuity  from  the  right  so 
that  §(w)  =  k  for  all  u  e  [Tk,  Tk+ 1). 

Thus,  for  the  original  process  §  (7)  we  have  constructed  a  modification  §  (t)  with 
trajectories  in  D+(T).  The  equivalence  of  §  and  §  follows  from  the  very  construc¬ 
tion  since,  by  virtue  of  (1), 

P(F(0  =  $(0)  =  P(;Hm  v{n)  =  £(0)  =  1. 
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One  usually  considers  just  such  right  (or  left)  continuous  modifications  of  the 
Poisson  process.  We  have  already  dealt  with  processes  of  this  kind  in  Chap.  10 
where  more  general  objects — renewal  processes — were  defined  from  scratch  using 
trajectories.  That  the  Poisson  process  is  a  renewal  process  is  seen  from  the  following 
considerations.  It  is  easy  to  establish  from  relations  (19.4.1)  that  the  distributions  of 
the  random  variables  7\,  T2  —  T\,  —  T2, . . .  coincide  and  that  these  variables  are 

independent.  Indeed,  the  difference  Tj  —  Tj-\,  j  >  1,  Tfi  =  0,  can  be  approximated 
by  the  sum  (yj  —  yj-\)A  of  the  lengths  of  identical  intervals  of  size  Ai  =  A,  where 
yj  is  the  number  of  the  interval  in  which  the  j-th  non-zero  increment  of  §  occurred. 
Since  the  process  $(t)  is  homogeneous  with  independent  increments,  we  have 

P {{Yj  ~  Vj-i)A  >  u)  =p(Vi  >  £)  =  ( e-/« 

P((Yj  -  Yj- l)A  >u)^  P (Tj  -  Tj- 1  >  u) 

as  A  ->  0.  Hence  the  variables  Tj  :=  Tj  —  7)_i,  j  =  1,  2,  3, . . . ,  have  the  exponen¬ 
tial  distribution,  and  the  value  §(f)  +  1  can  be  considered  as  the  first  crossing  time 
of  the  level  t  by  the  sums  Tj : 

§ ( t )  =  max{k  :  7^  <  t},  %(t)  +  1  =  min{k  :  7^  >  t}. 

Thus  we  obtain  that  the  Poisson  process  §  ( t )  coincides  with  the  renewal  process  ij(t) 
(see  Chap.  10)  for  exponentially  distributed  variables  t\,  T2, . . .  with  P(r j>u)  = 

g-fLU 

The  above  and  the  properties  of  the  Poisson  process  also  imply  the  following 
remarkable  property  of  exponentially  distributed  random  variables.  The  numbers  of 
jump  points  (i.e.  sums  7&)  which  fall  into  disjoint  time  intervals  8 j  are  independent, 
these  numbers  being  distributed  according  to  the  Poisson  laws  with  parameters  /i8j . 

Using  the  last  fact,  one  can  construct  a  more  general  model  of  a  pure  jump  ho¬ 
mogeneous  process  with  independent  increments.  Consider  an  arbitrary  sequence 
of  independent  identically  distributed  random  variables  fi,  £2,  •  •  •  that  have  a  ch.f. 
/3(X)  and  are  independent  of  the  a -algebra  generated  by  the  process  §  ( t ).  Construct 
now  a  new  process  £(t)  as  follows.  To  each  co  we  put  into  correspondence  a  new 
trajectory  obtained  from  the  trajectory  §  ( t )  by  replacing  the  first  unit  jump  with  the 
variable  £1,  the  second  one  with  the  variable  £2,  and  so  on.  It  is  easy  to  see  that  f  (t) 
will  also  be  a  process  with  independent  increments.  The  value  f  (t)  will  be  equal  to 
the  sum 


f  (0  —  ft  +  •  •  *  +  (19.4.2) 

of  the  random  number  §(t)  of  random  variables  fi,  £2,  •  •  •  >  where  §(t)  is  indepen¬ 
dent  of  {&}  by  construction. 

Hence,  by  the  total  probability  formula, 


EeiX((t)  =  y]p(f(0  =  k)Eeim+-+?k) 
k= 0 


OO 


=£ 


o tf  - 

- e 


lit 


(/6(A))' 


k= 0 


k\ 


—  e-nt+ntm  _  etit(p(x)~ i)  ^9  4 
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Definition  19.4.2  The  process  £(t)  defined  by  formula  (6)  or  ch.f.  (7)  is  called  a 
compound  Poisson  process.  It  is  evidently  a  special  case  of  the  generalised  renewal 
process  (see  Sect.  10.6). 

As  we  have  already  noted,  it  is  again  a  homogeneous  process  with  independent 
increments.  In  formula  (19.4.3),  the  parameter  p  determines  the  jumps’  intensity 
in  the  process  f  (t),  while  the  ch.f.  /3(A)  specifies  their  distribution.  If  we  add  a 
constant  “drift”  qt  to  the  process  £(t),  then  f  (t)  =  t;(t)  +  qt  will  clearly  also  be 

a  homogeneous  process  with  independent  increments  having  the  ch.f.  EelXi>^  = 
et(ikq+fi(P(k)- 1)) 

Finally,  if  a  Wiener  process  w(t)  with  zero  drift  and  diffusion  coefficient  o  is 
given  on  the  same  probability  space  and  is  independent  of  £(t),  and  to  each  co  we 
put  into  correspondence  a  trajectory  of  £(t)  +  w(t),  we  again  obtain  a  process  with 
independent  increments,  with  ch.f.  exp {t(iXq  +  /x(/3(A)  —  1)  —  X2o2  / 2)}. 

One  should  note,  however,  that  these  constructions  by  no  means  exhaust  the 
whole  class  of  processes  with  independent  increments  (and  therefore  the  class  of 
infinitely  divisible  distributions). 

A  description  of  the  entire  class  will  be  given  in  the  next  section. 

The  Poisson  processes,  as  well  as  Wiener  processes,  are  often  used  as  mathemat¬ 
ical  models  in  various  applications.  For  example,  the  process  of  counts  of  cosmic 
particles  of  certain  energy  registered  by  a  sensor  in  a  given  volume,  or  of  collisions 
of  elementary  particles  in  an  accelerator  are  described  by  the  Poisson  process.  The 
same  is  true  of  the  process  of  incoming  telephone  calls  at  a  switchboard  and  many 
other  processes. 

Due  to  representation  (19.4.2),  the  study  of  compound  Poisson  processes  re¬ 
duces,  in  many  aspects,  to  the  study  of  the  properties  of  sums  of  independent  random 
variables. 


19.5  Description  of  the  Class  of  Processes  with  Independent 
Increments 


We  saw  in  Theorem  19.1.1  that,  to  describe  the  class  of  distributions  of  stochasti¬ 
cally  continuous  processes  with  independent  increments,  it  suffices  to  describe  the 
class  of  all  infinitely  divisible  distributions.  Let,  as  before,  £  be  the  class  of  the 
ch.f.s  of  infinitely  divisible  distributions. 


Lemma  19.5.1  The  class  £  is  closed  with  respect  to  the  operations  of  multiplication 
and  passage  to  the  limit  ( when  the  limit  is  again  a  ch.f). 


Proof  (1)  Let  (p\  e  £  and  cp2  €  £•  Then  (p\(p2  =  (<P\n  •  where  cp\^n  •  i^n  is 


a  ch.f. 

(2)  Let  c pn  e  £,  cpn  — >  (p,  and  cp  be  a  ch.f.  Then,  for  any  m,  cpn  m  — >  <px^m 
as  n  oo,  where  c pl'm  is  continuous  at  zero  and  hence  is  a  ch.f.  The  lemma  is 
proved.  □ 
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Denote  by  £>n  C  £  the  class  of  ch.f.s  whose  logarithms  have  the  form 

hup{X)=iXq  +  '^2,ck{elXbk Ck>  0,  'p  c;(  <  oo. 

k  k 

We  will  call  this  the  Poisson  class.  We  already  know  that  it  corresponds  to  com¬ 
pound  Poisson  processes  with  drift  q  and  intensities  Ck  of  jumps  of  size  bk  (note 
that  JfkCk(elXbk  —  1)  =  (J2k  ck)E(elX^  —  1),  where  f  assumes  the  values  bk  with 
probabilities  Ck/J2j  Cj). 

Lemma  19.5.2  A  ch.f.  cp  belongs  to  £  if  and  only  if  <p  =  lim^^oo  cpn,  <pn  e  /C/7. 


Proof  Sufficiency.  Let 

In Pn  —  ^  ^(ihqk^n  T  Ck,n(fi’  k,n  l))? 

k 

and  <p  =  lim<^7  be  a  ch.f.  It  is  evident  that  <p)jm  e  £n  c  £  and  p)Jm  — >  <pl/m . 
Therefore  being  a  limit  of  a  sequence  of  ch.f.s  which  is  continuous  at  zero,  is 
a  ch.f.  itself,  so  that  <p  e  £. 

Necessity.  Let  <p  e  £.  Then  cp(k)  7^  0  and  there  exists  ft  :=  In ^9  with  n(pl^n  — 
1)  — >  /3,  and 

PI"  -  1  =  f  ( eiXx  -  1  )dFn(x). 

The  integral  of  the  continuous  function  on  the  right-hand  side  can  be  viewed  as  a 
Riemann-Stieltjes  integral.  This  means  that  for  Fn  there  exists  a  partition  of  the  real 
axis  into  intervals  Ank  such  that,  for  xnk  E  Ank  and  rn  <  cn~2, 

J (eiXx  -  1  )dFn(x)  =J2f  (iAX  ~  1  )Pn(Ank)  +  rn 

k 

( Pn(A )  is  the  probability  of  hitting  the  interval  A  corresponding  to  Fn  ).  We  obtain 


P  =  \imn(p,n  -  l)  =  ^lim^  n  -  1  )P„(A„k) 


The  lemma  is  proved. 


□ 


Theorem  19.5.1  (Levy-Khintchin)  A  ch.f.  <p  belongs  to  £  if  and  only  if  the  function 
fi  :=  In  p  admits  a  representation  of  the  form 

f  (  nr  /Ajc  \  1  +  x2 

p  =  p(X;a,V)  =  iXq+  /  f  -  1  - (19.5.1) 

where  F  is  a  non-decreasing  function  of  bounded  variation  (i.e.,  a  distribution  func¬ 
tion  up  to  a  constant  factor),  the  integrand  being  assumed  equal  to  —X2/2  at  the 
point  x  =  0  (by  continuity). 
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Proof  Assume  that  /3  has  the  form  (19.5.1).  Then  /3(X)  is  a  continuous  function, 
since  it  is  (up  to  a  continuous  additive  term  iXa)  a  uniformly  convergent  integral  of 
a  continuous  bounded  function.  Further,  let  xnk  0,  k  =  1 , . . . ,  n,  be  points  of  refin¬ 
ing  partitions  of  intervals  [— *Jn).  Then  /3°(A)  =  /3(A)  —  ikq  can  be  represented 
as  /3°  =  lim  /3n  with 


n 

:=  ^2\}^kn  +  Ckn  (e^^kn  —  l)]  G  /C/7, 
k=\ 


where,  under  a  natural  notational  convention,  one  should  put 


1  T  x 


ckn  — 


kn 


X 


([  fkn  >  Xk+\,n))  1 


kn 


L Jkn  —  ^  {\Xkn  >  %k+\,n))'i 

%kn 


bkn  —  %kn  ■> 


*F  being  used  to  denote  the  measure  *P (A)  =  fAd'P(x).  We  obtain  that  cp  is  a  limit 
of  the  sequence  of  ch.f.s  cpn  g  £77.  It  remains  to  make  use  of  Lemma  19.5.2. 

Now  let  (p  G  /C.  Then 


f  =  lim  n(cpl/u  —  l)  =  lim  J  (elXx  —  1  )ndFn(x) 


=  lim 


11  It 


nx 


+  v- 


dFn(x) 


/( 


2  „„2 


+  /  I  eiXx  -  1  - 


iXx  \  1  +  x  nx 


1  +  x2  J  x 


2  1  +  v: 


dFn(x) 


(19.5.2) 


If  we  put 


/nx  nx^ 

— — ^ dFn{x ),  tf'n(x)  :=  7— — 7-  dFn{x ), 


+  X 


1  + 


(19.5.3) 


then  on  the  right-hand  side  of  (19.5.2)  we  will  have  lim/3n,  fin  =  qn,^Fn). 

Now  assume  for  a  moment  that  the  following  continuity  theorem  holds  for  func¬ 
tions  from  L . 


Lemma  19.5.3  If  fin  =  /3(A;  qn,  ^n)  — >  P  kind  /3  is  continuous  at  the  point  X  =  0, 
then  /3(X)  has  the  form  /3(X;  q,  *P),  qn  —>  q  and  ^Fn  =>>  *P . 

The  symbol  =>►  in  the  lemma  means  convergence  at  the  points  of  continuity  of 
the  limiting  function  (as  in  the  case  of  distribution  functions)  and  that  ^Fn(±oo)  — > 
^(±00). 

If  the  lemma  is  true,  the  required  assertion  of  the  theorem  will  follow  in  an  obvi¬ 
ous  way  from  (19.5.2)  and  (19.5.3).  It  remains  to  prove  the  lemma. 

Proof  of  Lemma  19.5.3  Observe  first  that  the  correspondence  /3(X‘,  q,*F)  o  (q,*F) 
is  one-to-one.  Since  in  one  direction  it  is  obvious,  we  only  have  to  verify  that  f> 
uniquely  determines  q  and  *P .  To  each  ft  we  put  into  correspondence  the  function 
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y(X)  = 


where 


Therefore 


where 


C 

a 


m-\{P^+h)-(i(x-h)) 


dh 


~^~2 —  dd/{x)dh, 


l  tei(X+h)x  _  ei(X-h)x  )  =  £iXx  CQshx > 


j  e'Xx  (1  —  cos  hx)  dh  =  e'Xx  (l - - — 

sinv  \  1  +  x2 

0  <  c\  <  (  1 - I - -z —  <  C2  <  oo. 

x  J  xz 


y(X)  =  I  e,Xxdr(x), 


r(x)=  r  (\  - 

J  —oo  V 


sinw\  1  +  u2 


U  U‘ 


d^(u) 


is  (up  to  a  constant  multiplier)  a  distribution  function,  for  which  y  (A)  plays  the  role 
of  its  ch.f.  Clearly, 


sin  u 


u 


-l 

dT(u), 


so  that  we  obtained  a  chain  of  univalent  correspondences  /3  — >►  y  — >  L  — >  ^  which 
proves  the  assertion. 

We  return  to  the  proof  of  Lemma  19.5.3.  Because  e^n  — >  is  a  ch.f.,  and 

is  continuous  at  the  point  A  =  0,  we  see  that  is  a  ch.f.  and  hence  a  continuous 
function.  This  means  that  the  convergence  cpn  (p  is  uniform  on  any  interval, 


ri  r  i 

YnW  =  /  Ai(A)  ~  ~($i(A  +  +  Ai(A  —  ^)) 

do  _  z 


0  L 
1  r 


f  p(X)-\{p(X  +  h)  +  p(X-h)) 

J0  L  1 


dh  =:  y  (A), 


and  the  function  y  (w)  is  continuous.  By  the  continuity  theorem  for  ch.f.s,  this  means 
that  y(u)  is  a  ch.f.  (of  a  finite  measure  r),  rn  =>-  C  (where  rn  is  the  preimage  of 
yw),  ^  =>-  and  qn  ^  q.  Thus  we  establish  that 


P  =  lim  /3n  =  lim 


+ 


—  /A^  T- 


/(' 


/ 


,/Ajc 


-  1  - 


/Av 


1  +  JT 


d&n(x) 


Xx 


-  1  - 


/Av 


1  +  v: 


d^(x)  =  /3( A;  q,  W) 


Lemma  19.5.3  is  proved. 


□ 
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Theorem  19.5.1  is  proved. 


□ 


Now  we  will  make  several  remarks  in  regard  to  the  structure  of  the  process  §(t) 
and  its  relationship  to  representation  (19.5.1).  The  function  W  in  (19.5.1)  corre¬ 
sponds  to  the  so-called  spectral  measure  of  the  process  %(t)  (recall  that  we  agreed 
to  use  the  same  symbol  ^  for  the  measure  itself:  *^(A)  =  fAd'l'(x)).  It  can  be 
represented  in  the  form  /iV \(x),  where  p  =  ^(oo)  —  ^(— oo)  and  ^i(v)  is  a  dis¬ 
tribution  function. 

(1)  The  spectral  measure  of  the  Wiener  process  is  concentrated  at  the  point  0.  If 
^({0})  =  cr2,  then  ^(1)  e  ®q<a2. 

(2)  The  spectral  measure  ^  of  a  compound  Poisson  process  has  the  property 

/w-^n-xco. 

In  that  case 

fx  1  +  u2 

G(x)=  /  - —dV(u) 

J — oo  ^ 

possesses  the  properties  of  a  distribution  function,  and  may  be  written 

in  the  form 


ikq\  T 


-  1  )dG(x), 


where 


qi=q 


(3)  Consider  now  the  general  case,  but  under  the  condition  that  ^({0})  =  0.  As 
we  know,  the  function  f  can  be  approximated  for  small  A  by  expressions  of  the 
form  (we  put  A &  =  [( k  —  1  )A,  kA)) 


oo 

iXq  T-  ^  ' 

k=  — oo 
0 


i  A.  /  nr  a  \  1  T  (kA^ 

*{Ak)  +  (eakA  -  1)  V{Ak) 


kA 


ikAy 


which  corresponds  to  the  sum  of  Poisson  processes  with  jumps  of  sizes  kA  of  the 
respective  intensities 


1  +  {kA)2 
{kA)2 


*{Ak). 


If,  say, 


r°°  dv{x)  _ 

l+o  *2 


then  for  any  s  >  0  the  total  intensity  of  these  processes  with  jumps  from  the  interval 
(0,  e)  will  increase  to  infinity  as  A  — >  0.  This  means  that,  with  probability  1,  on  any 
time  interval  8  there  will  be  at  least  one  jump  of  size  smaller  than  any  given  e  >  0, 


19.5  Description  of  the  Class  of  Processes  with  Independent  Increments 


557 


so  that  the  trajectories  of  §(f)  will  be  everywhere  discontinuous.  To  “compensate” 
these  jumps,  a  drift  of  size  &  (Ak) /kA  is  added,  the  “total  value”  of  such  drifts  being 
possibly  unbounded  (if  =  oo). 

(4)  For  stable  processes  (see  Sect.  8.8)  the  functions  W  ( x )  have  power  “branches”, 
smooth  on  the  half-axes,  possessing  the  property  c\^l(x)  =  ^ f(c2X )  for  appropriate 
ci  and  C2. 


Chapter  20 

Functional  Limit  Theorems 


Abstract  The  chapter  begins  with  Sect.  20.1  presenting  the  classical  Functional 
Central  Limit  Theorem  in  the  triangular  array  scheme.  It  establishes  not  only  con¬ 
vergence  of  the  distributions  of  the  scaled  trajectories  of  random  walks  to  that  of 
the  Wiener  process,  but  also  convergence  rates  for  Lipshchitz  sets  and  distribution 
functions  of  Lipshchitz  functionals  in  the  case  of  finite  third  moments  when  the 
Lyapunov  condition  is  met.  Section  20.2  uses  the  Law  of  the  Iterated  Logarithm  for 
the  Wiener  process  to  establish  such  a  low  for  the  trajectory  of  a  random  walk  with 
independent  non-identically  distributed  jumps.  Section  20.3  is  devoted  to  proving 
convergence  to  the  Poisson  process  of  the  processes  of  cumulative  sums  of  indepen¬ 
dent  random  indicators  with  low  success  probabilities  and  also  that  of  the  so-called 
thinning  renewal  processes. 

20.1  Convergence  to  the  Wiener  Process 

We  have  already  pointed  out  in  Sect.  19.2  that  the  Wiener  processes  are,  in  a  certain 
sense,  limiting  to  random  polygons  with  vertices  at  the  points  (k/n,  Sk/y/n),  where 
S&  =  +  •••+§&  are  partial  sums  of  independent  identically  distributed  random 

variables  §i,  §2, . . .  with  zero  means  and  finite  variances.  Now  we  will  give  a  more 
precise  and  general  meaning  to  this  statement. 


Let 


£l ,n 


(20.1.1) 


be  independent  random  variables  in  the  triangular  array  scheme  (see  Sects.  8.3,  8.4), 


k 


that  have  finite  third  moments  E\^k,n\3  =  l^k,n  <oo. 

We  will  assume  without  loss  of  generality  (see  Sect.  8.4)  that 


n 
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Fig.  20.1  The  random 
polygon  sn  (t)  constructed 
from  the  random  walk 


•—  t>l. 

7  =  1 

so  that  7o,n  =  0,  G,n  =  1,  and  consider  a  random  polygon  with  vertices  at  the  points 
(tk,£k)>  where  we  suppress  the  second  subscript  n  for  brevity’s  sake:  tk  =  4}W, 
• 

We  obtain  a  random  process  on  [0,  1]  with  continuous  trajectories,  which  will 
be  denoted  by  sn  =  sn(t)  (see  Fig.  20.1).  The  functional  limit  theorem  (or  invari¬ 
ance  principle;  the  motivation  behind  this  second  name  will  be  commented  on  be¬ 
low)  states  that  for  any  functional  /  given  on  the  space  C(0,  1)  and  continuous  in 
the  uniform  metric,  the  distribution  of  f(sn)  converges  weakly  to  that  of  f(w)  as 
n  ->  oo: 


f(sn)=*f(w),  (20.1.2) 

where  w  =  w(t)  is  the  standard  Wiener  process.  The  conventional  central  limit  the¬ 
orem  is  a  special  case  of  this  statement  (one  should  take  f(x)  to  be  x(l)). 

The  above  assertion  is  equivalent  to  each  of  the  following  two  statements: 

1.  For  any  bounded  continuous  functional  /, 

E f(sn)  — >  E f(w),  n  ->  oo.  (20.1.3) 

2.  For  any  set  G  from  the  a-algebra  ®c(0,i)  of  Borel  sets  in  the  space 
C(0,  1)  (®c(0,i)  is  generated  by  open  balls  in  the  metric  space  C(0,  1)  endowed 

with  the  uniform  distance  p\  as  we  already  noted,  93c(0,i)  —  53^’^)  such  that 
P(u;g9G)=0,  where  9G  is  the  boundary  of  the  set  G,  one  has 

P(4  e  G)  ^  P(u)  g  G),  n^o o.  (20.1.4) 

Relations  (20.1.3)  and  (20.1.4)  are  equivalent  definitions  of  weak  convergence  of 
the  distributions  Vn  of  the  processes  sn  to  the  distribution  W  of  the  Wiener  process 
w  in  the  space  (C(0,  1),  ®c(0,i))-  More  details  can  be  found  in  Appendix  3  and  in 
[1]  and  [14]. 

The  main  results  of  the  present  section  are  the  following  theorems. 

As  before,  put  L3  :=  Y!k=\  ^k,n- 

Theorem  20.1.1  Let  L3  — >  0  as  n  — >  00  (the  Lyapunov  condition).  Then  the  con¬ 
vergence  relations  (20.1.2)-(20.1.4)  hold  true. 
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Remark  20.1.1  The  condition  L3  — >  0  can  be  relaxed  here  to  the  Lindeberg  condi¬ 
tion.  In  this  version  the  above  convergence  theorem  is  known  under  the  name  of  the 
Donsker-Prokhorov  invariance  principle. 


Along  with  Theorem  20.1.1  we  will  obtain  a  more  precise  assertion. 

Definition  20.1.1  A  set  G  is  said  to  be  Lipschitz  if  W(G^)  —  W(G)  <  cs  for  some 
c  <  00,  where  is  the  s -neighbourhood  of  G  and  W  is  the  measure  correspond¬ 
ing  to  the  Wiener  process. 


In  the  sequel  we  will  denote  by  the  letter  c  (with  or  without  subscripts)  absolute 
constants,  possibly  having  different  values. 


Theorem  20.1.2  If  G  is  a  Lipschitz  set ,  then 


P (Sn  €  G) 


P (w  e  G) 


<  cL 


1/4 

3 


(20.1.5) 


In  the  case  when  ^,n  =  %k/y/ n >  where  the  ^  do  not  depend  on  n  and  are  iden¬ 
tically  distributed  with  E§£  =  0  and  Var(^)  =  1,  the  right-hand  side  of  (20.1.5) 
becomes  cn~ 1//8. 

A  similar  bound  can  be  obtained  for  functionals.  A  functional  on  C(0,  1)  is  said 
to  be  Lipschitz  if  the  following  two  conditions  are  met: 

(1)  I/CO  -  / (y)|  <  cp(x,y ); 

(2)  the  distribution  of  f(w)  has  a  bounded  density. 


Corollary  20.1.1  If  f  is  a  Lipschitz  functional,  then  Gv  :=  {fix)  <  v }  is  a  Lips¬ 
chitz  set  ( with  one  and  the  same  constant  for  all  v),  so  that  by  Theorem  20.1.2 

sup|p(/(w)  <  v)  -  P <  v)  I  <  cL\/4. 

V 

The  above  theorems  are  consequences  of  Theorem  20.1.3  to  be  stated  below. 

Let 

hl,n->  •  •  •  ?  hn,n  (20.1.6) 

be  any  other  sequence  of  independent  identically  distributed  random  variables  in  the 
triangular  array  scheme  with  the  same  two  first  moments  E%,«  =0,  E rfi  =  aL’ 
and  finite  third  moments.  Denote  by  Fk,n  and  @r,n  the  distribution  functions  of  i=k,n 
and  rjk,m  respectively,  and  put 

n 

Vk,n  -=  E|  TJfc  ji  I  <  N3  -=  ^  '  Vk,n  ? 

k=  1 

Pk,n  ■=  j  \x\3\d(Fktn(x)-0k,n(x))\<l^k,n+Vk,n, 

n 

L°3-=J2lXin<L3  +  N3. 
k=  1 
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Denote  by  s'n(t)  the  random  process  constructed  in  the  same  way  as  sn(t)  but  using 
the  sequence  {r]k,n}- 

Theorem  20.1.3  For  any  A  e  ®c(0,i)  and  any  s  >  0, 

P(s„GA)<P(s'eA(2f))  +  ^. 

In  order  to  prove  Theorem  20.1.3,  we  will  first  obtain  its  finite-dimensional 
analogue.  Denote  by  f  and  r\  the  vectors  f  =  (fi, . . . ,  £n)  and  rj  =  (771 , ,  772) 
respectively,  where  &  :=  Jjj=i?j,n  and  rjk  :=  and  by  BU)  the  s- 

neighbourhood  of  a  set  B  e  W1 : 

B(e)  :=  |J  (x  +  v), 

xgB 
\v\ <£ 

where  x  =  (x\, . . . ,  xn),  v  =  (iq, . . . ,  vn),  and  \  v\  =  max^  \  vk\- 

Lemma  20.1.1  Let  B  be  an  arbitrary  Borel  subset  ofW 1 .  Then,  for  any  s  >  0, 

P(f  eB)<  P (rj  e  B(2s))  +  CAl. 

Proof  Introduce  a  collection  of  nested  neighbourhoods 

B(E\k):=  [J  (x\,  ...,Xk,Xk+\  +  Vk+i,  ...,xn  +  vn),  k  =  0,...,n, 

xgB 

\v\<s 

B  :=  B(E\n)  C  B(E)(n  -  1)  C  •  •  •  C  B(£)(l)  C  B(s)( 0)  =  B(s) 

and  denote  by  the  vector  (0, . . . ,  0,  1, 0, . . . ,  0),  where  1  stands  in  the  £-th  posi¬ 
tion.  It  is  obvious  that  if  x  e  B^£\k),  then 

x  -\-  QfcVk  G  B^£\k  —  1)  if\vk\<s.  (20.1.7) 

Further,  together  with  arrays  (20.1.1)  and  (20.1.6),  consider  the  collection  of 
“transitional”  arrays 

£l ,n->  •  •  •  j  £ k,n->  hk+\ ,n->  •  •  •  ?  hn,n->  k  =  0,  .  .  .  ,  72.  (20.1.8) 

Denote  by  £(£)  =  (fi (£),...,  ^n(k))  the  vectors  formed  by  the  cumulative  sums  of 
random  variables  from  the  £-th  row  (20.1.8),  so  that 

0  for  j  <k, 

+  hk+\,n  +  *  '  ‘  +  hj,n  for  j  >  k. 

To  continue  the  proof  of  Lemma  20.1.1  we  need  the  following. 


!The  extension  of  the  approach  to  proving  the  central  limit  theorem  used  in  Sect.  8.5,  which  is 
used  in  this  demonstration,  was  suggested  by  A.V.  Sakhanenko. 
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Lemma  20.1.2  For  any  random  variable  8  such  that  P(|<$|  <  s)  =  1,  one  has 

n 

P(fefi)<P(i)e5(2£))  +  ^]4,  (20.1.9) 

k=  1 

where 

Ak  =  P(£  (k)  +  8e(k  -  1)  €  5(£)(£  -  1))  -  P(f(fc  -  1)  +  Se(k  -  1)  e  B(E\k  -  1)), 

n 

e(r)  =  ^2  ej  =  (0,  ...,0, 1,...,  1). 

j=r+ 1 

Proof  Indeed,  by  virtue  of  (20.1.7), 

P(f  e  B)  =  P (f  (n)  e  fi(£)(«))  <  P(f  (n)  +  e(n  -  1)5  e  fi(£)(«  -  1)) 

=  P (?(n  -  1)  +e(n  -  1)5  e  B(£)(«  -  1))  +  zV 
Reapplying  the  same  calculations  to  the  right-hand  side,  we  obtain  that 

P (f  (n  -  1)  +  e(n  -  1)5  e  B(E)(n  -  1)) 

<  P(f  (n  -  1)  +  e(n  -  1)5  +  e„_,5  e  B{s)(n  -  2)) 

=  P(f  (n  -  1)  +  e(n  -  2)5  e  B{s)(n  -  2)) 

=  P(f  (n  -  2)  +  e(n  -  2)5  e  B(e){n  -  2))  +  An-i 

P(?(l)  +  e(l)5  e  B(E)(  1))  <  P(?(l)  +  e(l)5  +  ei5  e  B{E)( 0)) 

=  P(f (1)  +  e(0)5  e  B(£)(0))  =  P(f (0)  +  e(0)5  e  B(£)(0))  +  Ai. 

Since  £(0)  =  rj  and  P(rj  +  e(0)5  e  B<£^)  <  P(ri  e  B (2£^),  inequality  (20.1.9)  is 
proved.  Lemma  20.1.2  is  proved.  □ 

To  obtain  Lemma  20.1.1,  we  now  have  to  estimate  Ak.  It  will  be  convenient  to 
consider,  along  with  (20.1.8),  the  sequences 

^1  •  •  •  >  ^k—  l,ni  y •>  hk+l,ni  •  •  •  ?  hn,n 

and  denote  by  f  (k,  y )  =  (£i  (k,y), ... ,  t;n(k ,  y))  the  respective  vectors  of  cumulative 
sums,  so  that 

£ (^j  ^,k,n)  =  £ (^)  =  C (^5  0)  +  ^ 1) , 
f  (&,  hk,n)  =  ?(fc  -  1)  =  ?(fc,  0)  +  ^^e^-t) . 

Then  zA^  can  be  written  in  the  form 

Ak  =  P (f(*.  0)  +  (5  +  %k,n)e(k  -  1)  e  B(£)(fc  -  1)) 

-  P(f  (*,  0)  +  (5  +  %,„)e(fe  -  1)  e  B(E\k  -  1)).  (20.1.10) 
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Take  8  to  be  a  random  variable  independent  of  f  and  ij.  Then  it  will  be  convenient 
to  use  conditional  expectation  to  estimate  the  probabilities  participating  in  (20.1.10), 
because,  for  instance,  in  the  equality 

P (?(*,  0)  +  (5  +  ^n)e(k  -  1)  e  B(e)(k  -  1)) 

=  EP((<5  +  &t„)e(k  -  1)  e  B(s)(k  -  1)  -  f  (*,  0)  |  ?(*,  0))  (20.1.1 1) 

the  set  C  =  B^(k  —  1)  —  £ (k,  0)  may  be  assumed  fixed  (see  the  properties  of  con- 
ditional  expectations;  here  8  and  ^k,n  are  independent  of  £(&,  0)).  Denote  by  D  the 
set  of  all  ys  for  which  ye(k—  1)  e  C.  We  have  to  bound  the  difference 

P(£  +  %k,n  £  D)  —  P(<5  +  rjk,n  £  D).  (20.1.12) 

We  make  use  of  Lemma  8.5.1.  To  transform  (20.1.12)  to  a  form  convenient  for 
applying  the  lemma,  take  8  to  be  a  random  variable  having  a  thrice  continuously 
differentiable  density  g(t)  and  put  for  brevity  § k,n  =  §  and  =  rl-  Then  5  +  § 
will  have  a  density  equal  to 


/ 


dF^(t)g(y  -t)=  Eg(y  -  £), 


so  that 


P (8  +  ^eD)=[  Eg(y-^)dy  =  E  [  g(y-^)dy. 

JD  JD 


Now  putting 


h(x)  :=  /  g(y-x)dy, 
JD 


we  have 


P(H^D)=E^), 

where  /t  is  a  thrice  continuously  differentiable  function, 

h"'(x) |  <  f  \gm(y) |  dy  =:  hj. 
Applying  now  Lemma  8.5.1  we  obtain  that 

P(S  +  S  e  D)  -  P(3  +  r,  e  D)  \  =  |E (ft(0  -  h(r,))  |  < 


,,°  _ 

r^k,n  ~ 


Because  the  right-hand  side  here  does  not  depend  on  §(fc,  0)  and  D  in  any  way,  we 
get,  returning  to  (20.1.10)  and  (20.1.11),  the  estimate 


1^*1  ^ 


(20.1.13) 


Now  let  gi(jc)  be  a  smooth  density  concentrated  on  [—1,  1].  Then,  putting 

gW  :=  £>(“)“’ 
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we  obtain  that 


c  i  =  const. 


(20.1.14) 


The  assertion  of  Lemma  20.1.1  now  follows  from  (20.1.9),  (20.1.13)  and 
(20.1.14).  □ 


Proof  of  Theorem  20.1.3  This  theorem  is  a  consequence  of  Lemma  20.1.1.  Indeed, 
let  B  g  W1  be  such  that  the  events  {sn  e  A }  and  {£  e  B]  are  equivalent  (sn  is  com¬ 
pletely  determined  by  f ).  Then  clearly  {sn  G  A{£)]  =  {£  g  B^£)]  and  the  assertion  of 
Theorem  20. 1 .3  repeats  that  of  Lemma  20. 1 . 1 .  Theorem  20. 1 .3  is  proved.  □ 


Proof  of  Theorem  20.1.1  Let  w(t)  be  the  standard  Wiener  process.  Put  rjk,n  := 
w{tk,n)  ~  w(tk-\,n)-  Then  the  sequence  ri\  n, . . . ,  r]nn  satisfies  all  the  required  con¬ 
ditions,  for 

E m,n  =  0,  E r,2k  n  =  oln,  v*,„  =  E|^,„|3  =  c3aln  <  oo. 

Note  also  that 

=  (E||i,„|2)3/2  <  E|^,„|3  =  Hkt„, 

so  that 

n  n 

N3  =  J2  Vk’n  =  C3  °ln  <  C3L3  ^  0. 
k= 1  k=  1 

We  will  need  the  following 

Lemma  20.1.3  P (p(w,  s'n )  >  s)  <  cN^/s3 . 


Proof  The  event  {p(w,  s'n)  >  s}  is  equal  to  \^jk  Ak,  where 


Afc  :=  \  sup  w(t)  —  s\t ) |  >  £  \  C 
1  telk  J 


sup|w(f)|  >  - 

teh  L 


3k  • —  Uk— 1 5  tk\ 


d 


Therefore,  recalling  that  4  —  4-i  =  n  and  w(t)  =  crw(t/crz),  we  have 


P(A*)<P(  sup  w(t) 
te[  0,1) 


> 


2<x*, 


<2  1  —  0 


n 


2ctk, 


n 


The  function  (1  —  0(0)  vanishes  as  t  —>  00  much  faster  than  t  3 .  Hence 

.3 


2  1-0 


2<j kji 


<C^L 

-  L  0  9 

S'3 


Ku-4*)- 


-3 

C^V3 


Lemma  20.1.3  is  proved. 


□ 


We  see  from  the  proof  that  the  bound  stated  by  Lemma  20.1.3  is  rather  crude. 
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We  return  to  the  proof  of  Theorem  20.1.1.  Because 

P(4  e  G)  =  P (s'n  e  G,  p(w,  s'n)  <  s)  +  P (s'n  e  G,  p(w,  s'n)  >  e), 

we  have 

p(4  €  G)  <  P(io  e  G(£))  +  (20.1.15) 

£ 

and,  by  Theorem  20.1.3, 

/  n?)\  c(To  T  N3) 

P(s„  e  G)  <  P(w  e  G(3e))  +  — -L, - -. 

£3 

Now  we  prove  the  converse  inequality.  Introduce  the  set  G(_f  1  :=  G  —  ( 0 G ) 1  £  1 . 
Then  [G(_£)](e)  =:  G°  c  G.  Swapping  and  s'  in  Theorem  20.1.3  and  applying 
the  latter  to  the  set  G^ls\  we  obtain 

P (s„  e  G°)  >  P(4  €  G(~2£))  -  c(L3  +  jV3)^  (20.1.16) 

£ 

Swapping  w  and  s'n  in  (20.1.15)  and  applying  that  relation  to  G^~3s\  we  find  that 

P(4  6  G("2£))  >  p(w>  6  G(“3s))  - 

£ 

This  and  (20.1.16)  imply  that 

P(,s„  g  G)  >  P(,s„  e  G°)  >  P(u>  g  G(“3£))  -  C(X3  + vy 

£ 

Setting 

P(w  €  G(£))  -  P(w  g  G)  =  W(G(£))  -  W(G)  =:  WG(e) 
and  taking  into  account  that  N3  <  cLj,  and  L®  <  L3  +  M3,  we  will  obtain  that 

-WG(-3e)  +  C—1  <  P(s„  g  G)  -  W(G)  <  WG(3e)  +  (20.1.17) 

IfW(3G)=0  then  clearly 

w(g(3£))  -  w(g(_3£))  0 

as  £  — >►  0,  and  Wg(±3s)  — >  0.  From  this  and  (20.1.17)  it  is  easy  to  derive  that 

P(sn  gG)^  P (w  g  G),  >  oo. 

Convergence  /(^„)  ==>  /(tu)  for  continuous  functionals  follows  from  (20.1.4), 
since  if  v  is  a  point  of  continuity  of  the  distribution  of  f(w)  then  the  set  Gv  =  {x  e 
C( 0,  1) :  f(x)  <  v}  has  the  property 

W(dGv)  =  P(f(w)  =  v)  =0 

and  therefore 

P (f(Sn)  <V)^  P (f(w)  €  V). 

Theorem  20. 1 . 1  is  proved.  □ 
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Proof  of  Theorem  20.1.2  If  G  is  a  Lipschitz  set,  then 

Z\Wg(±3£)|  <  C£, 

and  by  (20.1.17) 

P(*s>2  €  G)  —  W(G)  <  c[£  + 


1/4 


Putting  £  :=  L3  we  obtain  the  required  assertion.  Theorem  20.1.2  is  proved.  □ 


The  reason  for  the  name  “ invariance  principle ”  used  to  refer  to  the  main  asser¬ 
tions  of  this  section  is  best  illustrated  by  Theorem  20.1.3.  By  virtue  of  the  theorem, 
one  can  approximate  the  value  of  P (sn  €  A)  by  P(s'n  e  A)  for  any  other  sequence 
(20.1.6)  having  the  same  first  two  moments  as  (20.1.1).  In  that  sense,  the  asymp¬ 
totics  of  P (sn  e  A)  are  invariant  with  respect  to  particular  distributions  of  the  un¬ 
derlying  sequences  with  fixed  first  two  moments.  For  example,  the  calculation  of 
P (sn  €  G)  or  P (w(t)  e  G)  can  be  replaced  with  that  of  P(^'  e  G)  for  a  Bernoulli 
sequence,  which  is  convenient  for  various  numerical  methods.  On  the  other  hand,  the 
probabilities  P(w  e  G)  for  a  whole  class  of  regions  G  were  found  in  explicit  form 
(see  e.g.  [32]).  We  know,  for  example,  that  P(supr€[0,i]  w(t)  >  y)  =  2(1  — 

(This  implies,  in  particular,  that  G  =  {x  e  C(0,  1)  supfE[0,i]  x(t)  >  y}  is  a  Lips¬ 
chitz  set.)  Hence  for  the  distribution  of  the  maximum  Sn  =  max^<„  Sk  of  the  sums 
Sk  =  Y^)= i  §/>  when  =  0  and  Var^  =  <j2,  we  have 

P (Sn  >  xa^/n)  — >  2(l  —  0(v)),  n  —>  oo, 

and  one  can  use  this  relation  for  the  approximate  calculation  of  the  distribution  of 
Sn  which  is,  as  we  saw  in  Chap.  12,  of  substantial  interest  in  applications. 

In  the  same  way  we  can  approximate  the  joint  distribution  of  Sn,  Sn,  and  S_n  := 
min k<n  Sk  (i.e.  the  probabilities  of  the  form  P (Sn  <  x^/n,  S_n  >  y^/n,  Sn  e  B)) 
using  the  respective  formulas  for  the  Wiener  process  given  in  Skorokhod  (1991). 


Remark  20.1.2  In  conclusion  of  this  section  note  that  all  the  above  assertions  will 
remain  true  if,  instead  of  sn(t),  we  consider  in  them  the  step  function  s*(t)  =  f k,n 
for  t  e  \tk,  4+i ).  One  can  verify  this  by  repeating  anew  all  the  arguments  for  s*. 
Another  way  to  obtain,  say,  Theorems  20.1.1  and  20.1.2  for  s*  is  to  make  use  of  the 
already  obtained  results  and  bound  the  distance  p(sn,s*).  Because 


one  has 


n 

{ P  + , 4 )  >  e}  C  lj{|+„|  >  e}, 

k= 1 


n 


n 


V(p(s"’sk)  >  s)  -  >  e)  <  X! 


Pk, 


n 


L3 


k=  1 


k=  1 


Recall  that  a  similar  bound  was  obtained  for  p(s'n ,  w),  and  this  allowed  us  to 
replace,  where  it  was  needed,  the  process  s'n  with  w.  Therefore,  using  the  same 
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argument,  one  can  replace  sn  with  s*.  In  that  case,  we  can  consider  convergence 
of  the  distributions  of  functionals  f(s*)  defined  on  D(0,  1)  (and  continuous  in  the 
uniform  metric  p).  Sometimes  the  use  of  s *  is  more  convenient  than  that  of  sn .  This 
is  the  case,  for  example,  when  one  has  to  find  the  limiting  distribution  of 

n 

y  ]§(^k,n)  —  n 

k=  1 

(t;k,n  are  identically  distributed).  It  follows  from  the  above  representation  that 


k= 


n 


oo. 
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Let  ,  §2,  •  •  •  be  a  sequence  of  independent  random  variables, 

E&  =  0,  E$  =  cr£,  E\t=k\3  =  fik, 


n 


n 


n 


k=  1 


Sn  —  y  '  £ k  i  —  y  '  ®k  ’  ^  '  dk  • 

k= 1  k=  1 

In  this  notation,  the  Lyapunov  ratio  is  equal  to 

Mn 


L?>  —  L?>,n  — 


Bl 


n 


In  the  present  section,  we  will  show  that  the  law  of  the  iterated  logarithm  for  the 
Wiener  process  and  Theorem  20.1.2  imply  the  following. 


Theorem  20.2.1  (The  law  of  the  iterated  logarithm  for  sums  of  random  variables) 
If  Bn  — >  oo  as  n  — >  oo  and  <  c/  In  Bn  for  some  c  <  oo,  then 

P(  iim  - ,  =  1  I  =  1 ,  (20.2.1) 

B„  72111111  B„  ) 

P|  lim -  =-1  1  =  1  (20.2.2) 

\«^oo  Bns/ 2  In  In  B„  J 

Thus  all  the  sequences  which  lie  above 

(1  +  s)Bny/ 2  In  In  Bn 

will  be  upper  for  the  sequence  of  sums  Sn ,  while  all  the  sequences  below 


will  be  lower. 


(1  -  s)Bny/ 2 lnln Bn 
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The  conditions  of  the  theorem  will  clearly  be  satisfied  for  identically  distributed 
for  in  that  case  B 3  =  crfn,  ). 


Proof  We  turn  to  the  proof  of  the  law  of  the  iterated  logarithm  in  Theorem  19.3.2 
and  apply  it  to  the  sequence  Sn .  We  will  not  need  to  introduce  any  essential  changes. 
One  just  has  to  consider  Snk  instead  of  w(ak ),  where  ni  —  min{n  :  B 3  >  ak],  and 
replace  ak  with  B%  where  it  is  needed.  By  the  Lyapunov  condition,  max^<w  a3  = 

o(B^),  so  that  B^  ^  ~  B„k  ~  ak  as  k  — >►  oo. 

The  key  point  in  the  proof  of  Theorem  19.3.2  is  the  proof  of  convergence  (for 
any  s  >  0)  of  the  series 


sup  w(u)  >  (1  +  (20.2.3) 

k  u<ak 

and  divergence  of  the  series 

P 

k 


W 


( ak )  —  w  (a 


k- 1 


(20.2.4) 


where 

Xk  =  \/ 2ak  In  In  ak,  w(ak )  —  w(ak~ *)  =  w(ak(  1  —  a-1)). 


In  our  case,  if  one  follows  the  same  argument,  one  has  to  prove  the  convergence  of 
the  series 


>  (1  +  e)yk-\) 

k 

and  divergence  of  the  series 


(20.2.5) 


(20.2.6) 


where  yk  =  J 22^  lnln  B%k  ~  But  the  asymptotic  behaviour  of  the  probabilities 

of  the  events  in  (20.2.3),  (20.2.5)  and  (20.2.4),  (20.2.6)  under  the  conditions  < 
c/\nBn  will  essentially  be  the  same.  To  establish  this,  we  will  make  use  of  the 
inequality 

p(^-  €  G^  -P (w  e  G(=bS)) 

which  follows  from  the  proof  of  Theorem  20.1.3.  By  this  inequality, 


< 


""P” 


(20.2.7) 


P 


>  (1  +  3£)v^  <  P^sup  w(u)  >  (1  +  2s)x^j  +  - — 


sup  w(u)  >  (1  +  2 s)xBn  )  + 
u<B\  ' 


,n 

(£V)3 
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Therefore  (see  (20.2.5)),  putting  n  :=  n k  and  v  :=  yk~\/Bn ,  we  obtain 

P(5,u  >  (1 +3e)yjt_i)  <  P(  sup  w(u)  >  (l  +2s)yk-i)  +  . 

V„</?4  7  e3(ln  In  B2k)3'2 

Here 


B%k>ak,  lnlnB,^  ~  lnln</  ~  ln£, 

c  C  C\ 

J  o  <C  _  _ 

,nk  —  In  Bnk  In  ak  k 

Consequently,  for  all  sufficiently  large  k  (recall  that  the  letter  c  denotes  different 
constants), 

p (s„k  >  (1  +3e)yic-i)  <p(  sup  w(u)  >  (1  +e)**-i)  +  n 

\usak  /  swanky/2 


Since 


oo 


E 


1 

k(lnk)^/2 


<  oo, 


(20.2.8) 


the  above  inequality  means  that  the  convergence  of  series  (20.2.3)  implies  that  of  se¬ 
ries  (20.2.5).  The  first  part  of  the  theorem  is  proved. 

The  second  part  is  proved  in  a  similar  way.  Consider  series  (20.2.6).  By  (20.2.7), 


P(S»t  -  Snk_,  >  (1  -  3 s)yk) 

=  P(^d)  -snt(rk)  >  (1-3 e)2Lj 

>  pA(1)  -  win)  >  (1  -  2 e)2L)  -  ^§^(lnlnZ?2J_3/2 

where  r k  =  Br^  ,  B~ 2  — >  a~l  due  to  the  fact  that 

nk—\  nk 

Bnk  =ak+dk<?nk,  0  <  0k  <  1 ,  <T„2  =  O  (fi2*  )  • 

The  first  term  on  the  right-hand  side  of  (20.2.9)  is  equal  to 


(20.2.9) 


P  u;(l-r*)>(l 


P(w(ak(l  -  n))  >  (1  -  e)xk) 


> 


As  before,  the  series  consisting  of  the  second  terms  on  the  right-hand  side  of  (20.2.9) 
converges  by  virtue  of  (20.2.8).  Therefore  the  established  inequalities  mean  that 
the  divergence  of  series  (20.2.4)  implies  that  of  series  (20.2.6).  The  theorem  is 
proved.  □ 
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Now  we  will  present  an  example  that  we  need  to  complete  the  proof  in  Re¬ 
mark  4.4.1. 


Example  20.2.1  Let  be  independent  and  identically  distributed,  =  0, 
E£2  =  1,  E|^|3  =  i±  <  oo  and  ^  =  \/2 Jc£k-  Here  we  have  B 2  =  n2{  1  +  1  /n). 
In  Remark  4.4.1  we  used  the  assertion  that  (in  a  somewhat  different  notation) 


(X) 


P  (J{S„ '<-«}  1  =  1 


n = 1 


or,  which  is  the  same  (as  the  sign  of  Sn  is  inessential), 


To  verify  it,  we  will  show  that  any  sequence  of  the  form  B'n 
lower  for  {Sk}.  In  our  case, 


=  1.  (20.2.10) 
=  Bn(l  +  0(l/n))  is 


n 


Mn  =  y>*)3/: 


/x 


cn 


5/2 


Ll.n  — 


M, 


n 


k=  1 


B 3 


cn 


■1/2 


« 


1 


In  5 


This  means  that  the  conditions  of  Theorem  20.2.1  are  met,  and  hence  any  sequence 
which  lies  lower  than  (1  —  s)n\Jl  In  In  n  (in  particular,  the  sequence  B'n  =  n)  is  lower 
for  { Sk}.  This  proves  (20.2.10).  □ 


Let  us  return  to  Theorem  20.2.1.  As  we  saw  in  Sect.  19.3,  the  proof  of  the  law 
of  the  iterated  logarithm  is  based  on  the  asymptotics  (the  rate  of  decrease)  of  the 
function  1  —  &(x)  as  v  ^  oo.  Therefore,  the  conditions  for  the  law  of  the  iterated 
logarithm  for  the  sums  Sn  are  related  to  the  width  of  the  range  of  v  values  for  which 
the  probabilities 


P«±M  :=P 


>  v 


are  approximated  by  the  normal  law  (i.e.  by  the  function 
counter  the  problem  of  large  deviations  (see  Chap.  9).  If 


as  n 


oo  for  all 


Pft±C*0 

l-<P(x) 


<P(x)).  Here  we  en- 


(20.2.11) 


v  <  y21nln Bn(  1  —  s) 


(20.2.12) 


and  some  e  >  0  then  the  proof  of  the  law  of  the  iterated  logarithm  for  the  Wiener 
process  given  in  Sect.  19.3  can  easily  be  extended  to  the  sums  Sn/Bn  (to  estimate 
P (Sn/Bn  >  x)  one  has  to  use  the  Kolmogorov  inequality;  see  Corollary  11.2.1). 

One  way  to  establish  (20.2.11)  and  (20.2.12)  is  to  use  estimates  for  the  rate 
of  convergence  in  the  central  limit  theorem.  This  approach  was  employed  in  the 
proof  of  Theorem  20.2.1,  where  we  used  Theorem  20.1.3.  However,  to  ensure  that 
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(20.2.11)  and  (20.2.12)  hold  one  can  use  weaker  assertions  than  Theorem  20.1.3.  To 
some  extent,  this  fact  is  illustrated  by  the  following  assertion  (see  [32]): 


Theorem  20.2.2  If  Bn  oo  and  Bn+\/Bn  — >►  1  as  n  ^  oo,  and 

Sr 


sup 


m 


B 


<X  )  —  0(x) 


n 


<  c( In  Bn) 


-1-5 


for  some  8  >  0  and  c  <  oo,  then  the  law  of  the  iterated  logarithm  holds. 


If  t=k  =  £  are  identically  distributed  then  Theorem  20.2.1  implies  that  the  law  of 
the  iterated  logarithm  is  valid  whenever  E|§  | 3  exists.  In  fact,  however,  for  identically 
distributed  §&,  the  law  of  the  iterated  logarithm  always  holds  in  the  case  of  a  finite 
second  moment,  without  any  additional  conditions. 


Theorem  20.2.3  (Hartman-Wintner,  [32])  If  the  ^  are  identically  distributed , 
=  0,  and  =  1,  then  (20.2.1)  and  (20.2.2)  hold  with  Bf:  replaced  with  n. 
Every  point  from  the  segment  [—1,  1]  is  a  limiting  one  for  the  sequence 


- •> 

n  In  In  n 


The  last  assertion  of  the  theorem  means  that,  for  each  t  e[— 1,1]  and  any  s  >  0, 
the  interval  (t  —  s,  t  +  s)  contains,  with  probability  1,  infinitely  many  elements  of 
the  sequence 

Sn 

V 2h  In  In  n 


20.3  Convergence  to  the  Poisson  Process 

20.3.1  Convergence  of  the  Processes  of  Cumulative  Sums 

The  theorems  of  Sects.  20.1  and  20.2  show  that  the  Wiener  process  describes  rather 
well  the  evolution  of  the  cumulative  sums  when  summing  “conventional”  random 
variables  § k,n  satisfying  the  Lyapunov  condition.  It  turns  out  that  the  Poisson  process 
describes  in  a  similar  way  the  evolution  of  the  cumulative  sums  when  the  random 
variables  %k,n  correspond  to  the  occurrence  of  rare  events. 

As  in  Sect.  5.4,  first  we  will  not  consider  the  triangular  array  scheme,  but  obtain 
precise  inequalities  describing  the  proximity  of  the  processes  under  study.  Consider 
independent  random  variables  , . . . ,  with  Bernoulli  distributions: 

n 

p(& =o) = i  -  pk,  yy  Pk = t1  ■ 

k= i 


p(& = i) = pu. 
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We  will  assume  that  p  :=  max^<„  pk  is  small  and  the  number  /x  is  “comparable 
with  1”.  Put 


Pk 

qo  •=  0,  qk:= — , 


k 

Qk  :=  57  cii  ’ 

7=0 


and  form  a  random  function  5^(7)  on  [0,  1]  in  the  following  way.  Put  5^(0)  :=  0, 

k 

sn(t)  :=  Sk  =  Y, for f  e  (j2*-i,  <2*],  fc=l,...,n. 

7  =  1 

Here  it  is  more  convenient  to  use  a  step  function  rather  than  a  continuous  trajectory 
sn  (t)  (cf.  Remark  20.1.2).  The  assertions  to  be  obtained  in  this  section  are  similar  to 
the  invariance  principle  and  state  that  the  process  sn(t )  converges  in  a  certain  sense 
to  the  Poisson  process  §(7)  with  intensity  /x  on  [0,  1].  This  convergence  could  of 
course  be  treated  as  weak  convergence  of  distributions  in  the  metric  space  D(0,  1). 
But  in  the  framework  of  the  present  book,  it  is  apparently  inexpedient  for  at  least 
two  reasons: 

1.  To  do  that,  we  would  have  to  introduce  a  metric  in  D(0,  1)  and  study  its  prop¬ 
erties,  which  is  somewhat  complicated  by  itself. 

2.  The  trajectories  of  the  processes  sn(t)  and  §(7)  are  of  a  simple  form,  and 
characterising  their  closeness  can  be  done  in  a  simpler  and  more  precise  way  without 
using  more  general  concepts.  Indeed,  as  we  saw,  the  trajectory  of  §( t )  on  [0,  1]  is 
completely  determined  by  the  collection  of  random  variables  (tt(  1);  7) , . . . ,  77r(i) ) , 
where  Tk  is  the  epoch  of  the  k-th  jump  of  the  process,  T^+i  —  Tk  €=  T^.  A  similar 
characterisation  is  valid  for  the  trajectories  of  sn(t ):  they  are  determined  by  the 
vector  (sn(l),Q\, . . .  ,QSn(i)),  where  0k  =  Qyk,  yi,  y2,  •  •  •  are  the  values  j  for  which 
%j  =  1.  We  will  say  that  the  distributions  of  sn(t)  and  n (t)  are  close  to  each  other  if 
the  distributions  of  the  above  vectors  are  close.  This  convention  will  correspond  to 
convergence  of  the  processes  in  a  rather  strong  and  natural  sense. 

It  is  not  hard  to  see  from  what  we  said  before  about  the  Poisson  processes  (see 
Sect.  19.4)  that  the  introduced  convergence  of  the  distributions  of  the  jump  points  of 
the  process  sn(t)  is  equivalent  to  convergence  of  the  finite-dimensional  distributions 
of  sn(t )  to  those  of  7x(t)  (we  know  that  the  trajectories  of  sn(t)  are  step  functions). 


Theorem  20.3.1  The  processes  sn(t)  and  n (t)  can  he  constructed  on  a  common 
prohahility  space  so  that 

n 

P(s„(1)  =  jt(1);  0k-qYk  <Tk<9k,  k  =  1, jt(1))  >  1  -  ^p)- 

7=1 

(20.3.1) 

Since  E”=i  p)  <  pp ,  the  smallness  of  p  means  that,  with  probability  close  to  1, 
the  values  of  5^(1)  and  7r(l)  coincide  (cf.  Theorem  5.4.2)  and  the  positions  of  the 
respective  points  of  jumps  of  the  processes  sn(t)  and  n(t)  do  not  differ  much. 
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Put  q  =  ~p/ 1±  and,  for  a  fixed  k  >  1,  denote  by  B^£)  the  s -neighbourhood  of  the 
orthant  set  B  :=  {(x\, . . . ,  Xk)  :  Xj  <  Vj,  j  <  &}  for  some  Vj  >  0.  Theorem  20.3.1 
implies  the  following. 

Corollary  20.3.1  For  any  k  =  1, . . . ,  n, 

n 

P(s„(l)  =  fc,  (0\,...,0k)  e  B)  <  P(tt(1)  =k,(Ti,...,Tk)e  B )  +  ^  p) ; 

/=! 

n 

P(tt(1)  =k,(Ti,...,Tk)eB)<  P(s„(l)  =  k,  (du  . . . ,  Qk)  e  B®)  + 

7=1 

Proof  Let  Aw  denote  the  event  appearing  on  the  left-hand  side  of  (20.3.1), 

Dn  :=  {s„(l)  =k,  (di,  ...,6k)  e  B), 

C„  :=  {tt(1)  =  k,  (Ti,  ...,Tk)  e  B}. 

Then,  by  virtue  of  (20.3.1), 

n 

P(Dn)<P(DnAn)  +  J2p2j 

7  =  1 

n 

<  P (Dn,  tt(1)  =  k,  (7i, . . . ,  7^)  G  7?)  +  p^j 

7=1 

n 

<P(tt(1)=*,  (7i,...,7*)Gfl)  +  ^p?. 

7  =  1 

The  converse  inequality  is  established  similarly.  The  corollary  is  proved.  □ 

Proof  of  Theorem  20.3.1  Let  rjk  :=  7 z(Qk)  —  7t(Qk- 1),  A:  =  1, . . . ,  zz.  The  theorem 
will  be  proved  if  we  construct  {£&}  and  {z/^}  on  a  common  probability  space  so  that 

(n  \  n 

(20.3.2) 

&=l  /  j=l 

A  construction  leading  to  (20.3.2)  has  essentially  already  been  used  in  Theo¬ 
rem  5.4.2.  The  required  construction  will  be  obtained  if  we  consider  independent 
random  variables  co\ , . . . ,  con ;  cok  €=  Uo,  t ,  and  put 

0  if  tok  <e~Pk  =:7to,k, 

j  >  1  if  COke  [l Tj-l,lc,  7Tj,k), 

where  7tjyk  =  IIw([0,  j)),  j  =  0, 1, ... .  Then  r]k  e  UPi,  Yl=\  %  ^  ni *> 
fe  /  %}  =  {wr  €  [1  -  pk,e~Ph)  U  [ e~Pk  +  pke~n,  l]}. 


&:= 


0  if  cok  <  1  -  pk , 
1  if  cok  >  1  -  pk , 


%  := 
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Therefore, 

p (&  /  m)  <  pf 

and  we  get  (20.3.2).  The  theorem  is  proved.  □ 


If  we  now  consider  the  triangular  array  scheme  £i,w, ...»  §w,w,  for  which 

=  1)  =  Pk,ru  P(£ k,n  =  0)  =  1  Pk,n-> 


n 

^  '  Pk,n  — •  Pn  ^  Pi 
k=  1 


=  max  ->  0 


as  «  — >►  oo,  then  Theorem  20.3.1  easily  implies  convergence  of  the  finite -dimensional 
distributions  of  the  processes  sn(t)  to  i r(t),  where  sn(t)  is  constructed  as  before 
and  7 r(t)  is  the  Poisson  process  with  parameter  pi.  Consider,  for  example,  the  two- 
dimensional  distributions  P(,sn(0  >  j,  sn(l)  =  k)  for  t  e  (0,  1),  j  <  k.  In  the  no¬ 
tation  of  Theorem  20.3.1  (to  be  precise,  we  have  to  add  the  subscript  n  where 
appropriate;  e.g.,  the  Poisson  processes  with  parameters  pun  and  pi  will  be  denoted 
by  7 rn(t)  and  Tt(t),  respectively),  we  obtain 

p (sn(t)  >  j,  s„(  1)  =  k)  =p(s„(l)  =  k.  9j  <  t ). 

By  Corollary  20.3.1  the  right-hand  side  does  not  exceed 


P(lT„(l)=k,  Tj  <t)  +  J2  Pin’ 

1=1 


where,  as  is  easy  to  see, 


P(7T„(1)  =k,  Tj  <  t)  =  P(jt„(1  )  =  k,  7tn(t )  >  j) 

k 

=  £p(ir„(0  =  l)P(*n(l  -0  =  k-l) 

l=j 

->  P(jr(l)  =  J r(0  >  j) 

as  n  —>  oo,  so  that 

p (sn(t)  >  j,  Sn(l)  =  k)  <  P pit)  >  j,  71(1)  =k)  +o(l). 

The  converse  inequality  is  established  in  a  similar  way  (by  using  the  convergence 
qn  — >  0  as  n  — >►  oo).  The  required  convergence  of  the  finite-dimensional  distribu¬ 
tions  is  proved.  □ 


20.3.2  Convergence  of  Sums  of  Thinning  Renewal  Processes 

The  Poisson  process  can  appear  as  a  limiting  one  in  a  somewhat  different  set-up — as 
a  limit  for  the  sum  of  a  large  number  of  homogeneous  “slow”  renewal  processes. 
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We  formulate  the  setting  of  the  problem  more  precisely.  Let  hi(0fi  =  1,2 , ,n, 
be  mutually  independent  arbitrary  homogeneous  renewal  processes  in  the  “triangu¬ 
lar  array  scheme”  (i.e.  they  depend  on  n)  generated  by  sequences  {r^}^zl  for  which 

(see  Chap.  10;  for  k  >  2) 


t 


E r]i(t)  =  —,  at  :=  ai  n  =  Er(i)  — »■  oo. 

di  “  di 

l  — l 


n 

■  lA 


/X 


for  a  fixed  /x,  and 


Fi(t)  :=  P(r(,)  <  t)  <  rUn  0 


and  for  any  fixed  t  as  n  — >  oo,  where  77  >w  does  not  depend  on  /. 


Theorem  20.3.2  Under  the  above  conditions ,  finite -dimensional  distributions 

of  the  process 

n 

Kn  (0  •—  ^  '  hi  (0 
i  =  l 

converge  as  n  — >  oo  to  those  of  the  Poisson  process  n(t)  with  the  parameter  /x:  for 
any  l  >  1 ,  0  <  <•••<&/  > 

P(?/i(*i)  i;n(ti)  =  ki )  ->  P(tt(*i)  =*i, . . .  ,7T(f/)  =  £/). 

(On  convergence  to  the  Poisson  process,  see  the  remark  preceding  Theo¬ 
rem  20.3.1.) 


Proof  First  we  will  prove  convergence  of  the  distributions  of  the  increments 

(^  T  w)  £,n(p) 

to  the  Poisson  distribution  with  parameter  /xt.  Put  A  :=  rn(t  +  u)  —  r) /(m), 
pi  ;=  t/at .  We  have  (x*  (m)  is  the  excess  for  the  process  rji ;  see  Sects.  10.2,  10.4) 


E  A-  =  Pi , 


P(A  >  /)  <  p(x.(«)  <  0[pfe(1))  < 4  1 

—  f  P(?21}  >z)dz-  Fi(t)l~l  <  —(rt,n) 

&i  JO  &i 

This  implies  that 


<  — 
a , 


l- 1 


Pir\,n 


E  A  =  A-  =  y^P(A  =  /)  =  PA  =  1)  +  o(Pi), 

1 

P(A  =  1)  =  Pi  +  o(pi),  P(A  =  0)  =  1  -  pi  +  o(pi). 
Therefore  the  conditions  of  Corollary  5.4.2  are  met,  which  implies  that 

n 

+  W)  ^  A  n^. 

i  =  1 


(20.3.3) 


(20.3.4) 
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It  remains  to  prove  the  asymptotic  independence  of  the  increments.  For  simplic¬ 
ity’s  sake,  consider  only  two  increments,  on  the  intervals  ( u ,  0)  and  (u,  u  +  t),  and 
assume  that  £w(m)  =  k.  Moreover,  suppose  that  the  following  event  A  occurred:  the 
renewals  occurred  in  the  processes  with  numbers  i\ , . . . ,  /&.  It  suffices  to  verify  that, 
given  this  condition,  (20.3.4)  will  still  remain  true.  Let  B  be  the  event  that  there 
again  were  renewals  on  the  interval  (u,  u  +  t)  in  the  processes  with  the  numbers 
i  i, . . .  ,ik-  Evidently, 

k 

P(£  |  A)  <  ^P(r(?/)  <  t  +  u)  <  krt+Uin  ->  0. 
l=l 

Thus  the  contribution  of  the  processes  rjin  l  =  1, . . . ,  k,  to  the  sum  (20.3.4)  given 
condition  A  is  negligibly  small.  Consider  the  remaining  n  —  k  processes.  For  them, 


P (At  >  1 1  A)  = 


P (Xi  (0)  £  (u,  u  +  t)) 

P(Xi(0)  >  u) 


Y  ru+t  r  \  Cu  1 

=  —  /  (I  -  F,(z))da  1 - /  (1  -Fi(zj)dz 

Ju  J 0 


=  Pi  +o(pi). 


(20.3.5) 


Since  relation  (20.3.3)  remains  true  for  conditional  distributions  of  A\  (given  A 
and  for  i  ^  //,  /  =  1, . . . ,  k),  we  obtain,  similarly  to  the  above  argument  (using  now 
instead  of  the  equality  J2^i  =  0  =  Pi  the  relation  =l\A)  = 

Pi  +  o(pi)  which  follows  from  (20.3.5))  that 


P (A  =  1 1  A)  =  pi  +  o(pi),  P (Ai  =0\A)  =  l-pi+  o(pi). 


It  remains  to  once  again  make  use  of  Corollary  5.4.2. 


□ 


Chapter  21 

Markov  Processes 


Abstract  This  chapter  presents  the  fundamentals  of  the  theory  of  general  Markov 
processes  in  continuous  time.  Section  21.1  contains  the  definitions  and  a  discus¬ 
sion  of  the  Markov  property  and  transition  functions,  and  derives  the  Chapman- 
Kolmogorov  equation.  Section  21.2  studies  Markov  processes  in  countable  state 
spaces,  deriving  systems  of  backward  and  forward  differential  equations  for  tran¬ 
sition  probabilities.  It  also  establishes  the  ergodic  theorem  and  contains  examples 
illustrating  the  presented  theory.  Section  21.3  deals  with  continuous  time  branch¬ 
ing  processes.  Then  the  elements  of  the  general  theory  of  semi-Markov  processes 
are  presented  in  Sect.  21.4,  including  the  ergodic  theorem  and  some  other  related 
results  for  such  processes.  Section  21.5  discusses  the  so-called  regenerative  pro¬ 
cesses,  establishing  their  ergodicity  and  the  Laws  of  Large  Numbers  and  Central 
Limit  Theorem  for  integrals  of  functions  of  their  trajectories.  Section  21.6  is  devoted 
to  diffusion  processes.  It  begins  with  the  classical  definition  of  diffusion,  derives  the 
forward  and  backward  Kolmogorov  equations  for  the  transition  probability  function 
of  a  diffusion  process,  and  gives  a  couple  of  examples  of  using  the  equations  to 
compute  important  characteristics  of  the  respective  processes. 


21.1  Definitions  and  General  Properties 

Markov  processes  in  discrete  time  (Markov  chains)  were  considered  in  Chap.  13. 
Recall  that  their  main  property  was  independence  of  the  “future”  of  the  process  of 
its  “past”  given  its  “present”  is  fixed.  The  same  principle  underlies  the  definition  of 
Markov  processes  in  the  general  case. 


21.1.1  Definition  and  Basic  Properties 

Let  {£2,$,  P)  be  a  probability  space  and  {§( t )  =  §(L  <*>)  ,  t  >  0}  a  random  process 
given  on  it.  Set 

3i  :=<t(£(m);  u  <  t ),  3^,00)  :=cr($(u);  u  >  t ), 
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so  that  the  variable  §(w)  is  ^-measurable  for  u  <  t  and  #[*j00) -measurable  for  u  >t. 
The  cr-algebra  cr(#*,  #[*>00))  is  generated  by  the  variables  §(w)  for  all  u  and  may 
coincide  with  #  in  the  case  of  the  sample  probability  space. 

Definition  21.1.1  We  say  that  is  a  Markov  process  if,  for  any  t ,  A  e  #*,  and 
5  G  #[*,00),  we  have 

P(Afi|$(f))  =  P(A|$(f))P(B|$(0).  (21.1.1) 

This  expresses  precisely  the  fact  that  the  future  is  independent  of  the  past  when  the 
present  is  fixed  (conditional  independence  of  #*  and  #[*>00)  given  §(t)). 

We  will  now  show  that  the  above  definition  is  equivalent  to  the  following. 

Definition  21.1.2  We  say  that  § (t)  is  a  Markov  process  if,  for  any  bounded  d[t,oo)~ 
measurable  random  variable  77, 

E0j|&)  =  E(»j|$(f)).  (21.1.2) 

It  suffices  to  take  77  to  be  functions  of  the  form  77  =  f(^(s))  for  s  >  t. 

Proof  of  the  equivalence  Let  (21.1.1)  hold.  By  the  monotone  convergence  theorem 
it  suffices  to  prove  (21.1.2)  for  simple  functions  77.  To  this  end  it  suffices,  in  turn, 
to  prove  (21.1.2)  for  77  =  I#,  the  indicator  of  the  set  B  e  #[*>00).  Let  A  e  #*.  Then, 
by  (21.1.1), 

P(Afi)  =  EP(Afl|$(f))  =  E[P(A|$(f))P(B|£(0)] 

=  EE[IAP(B  (0)  |$(/)]  =E[IAP(B  |£(/))].  (21.1.3) 

On  the  other  hand, 

P(AB)  =  EPaIb]  =  E[lAP(fl|&)].  (21.1.4) 

Because  (21.1.3)  and  (21.1.4)  hold  for  any  A  e  #*,  this  means  that  P(5|#*)  = 

P(*l£(0). 

Conversely,  let  (21.1.2)  hold.  Then,  for  A  g  #*  and  5  g  #[*j00),  we  have 

P(AZ?  |^(0)  =  E[E(IaIb|&) \m]  =E[IAE(I*I&)  |$(o] 

=  E[IAE(IB||  (0)  |$(0]  =  P(B|?(0)P(A|?(0).  □ 

It  remains  to  verify  that  it  suffices  to  take  rj  =  f  (s)) ,  s  >  t,  in  (21.1.2).  In  order 
to  do  this,  we  need  one  more  equivalent  definition  of  a  Markov  process. 

Definition  21.1.3  We  say  that  §(t)  is  a  Markov  process  if,  for  any  bounded  function 
/  and  any  t\  <  t2  <  •  •  •  <  tn  <  t, 

E  (f{m)  |£(*i),  •  •  • ,  Htn))  =  E(/(K0|l  fe»). 


(21.1.5) 
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Proof  of  the  equivalence  Relation  (21.1 .5)  follows  in  an  obvious  way  from  (21.1.2). 
Now  assume  that  (21.1.5)  holds.  Then,  for  any  A  e  . . . ,  §(*„)), 

E (f(m);  A)=E[E(f(f(0)l£(t„));  A],  (21.1.6) 

Both  parts  of  (21 . 1 .6)  are  measures  coinciding  on  the  algebra  of  cylinder  sets.  There¬ 
fore,  by  the  theorem  on  uniqueness  of  extension  of  a  measure,  they  coincide  on  the 
a -algebra  generated  by  these  sets,  i.e.  on  $tn .  In  other  words,  (21.1.6)  holds  for  any 
A  e  $tn »  which  is  equivalent  to  the  equality 

E[/(S(0)|&d=E[/(£(0)|$(f»)] 

for  any  tn  <  t.  Relation  (21.1.2)  for  rj  =  /(§(*))  is  proved.  □ 


We  now  prove  that  in  (21.1.2)  it  suffices  to  take  rj  =  f(^(s)),  s  >t.  Let  t  <u\  < 
•  •  •  <  un.  We  prove  that  then  (21.1.2)  is  true  for 


n 

V  =  Y [/«(*(“«•)).  (21-1.7) 

1  =  1 

We  will  make  use  of  induction  and  assume  that  equality  (21.1.2)  holds  for  the 
functions 


n  —  1 

y  =  II 

i  =  1 


(forn  =  1  relation  (21.1.2)  is  true).  Then,  putting  g(un-\)  :=  E[/W(§(wn))|£(ww_i)], 
we  obtain 


Em,)  =  E[E(i;|&l,1_1)|&]  =  E[yE(/4$(i<„))|&lll_I)|&] 

=  E[yE(/„  (§(«„))  |£(w„_i))  |3>]  =E[yg(£(«„_i))|3rf]. 

By  the  induction  hypothesis  this  implies  that 

E(r!\dt)  =  E[yg(Hu„-i))\Ht)] 

and,  therefore,  that  JL(r)\$t)  is  a -measurable  and 

E(j?|  m)  =  E(EW$t)lf(0)  =  E(rjl^). 

We  proved  that  (21.1.2)  holds  for  cr(§(wi), . . . ,  $(un)) -measurable  functions  of 
the  form  (21.1.7).  By  passing  to  the  limit  we  establish  first  that  (21.1.2)  holds  for 
simple  functions,  and  then  that  it  holds  for  any  d[t,oo)  -measurable  functions.  □ 


21.1.2  Transition  Probability 


We  saw  that,  for  a  Markov  process  §(t),  the  conditional  probability 

P(£(0  e  B\ds)  =  P($(r)  e  5|§(j))  for  t>s 
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is  a  Borel  function  of  ^(s)  which  we  will  denote  by 

P(s,$(sy,t,B)  :=P($(f)efl|$(s)). 


One  can  say  that  P(s,  x;  t,  B)  as  a  function  of  B  and  v  is  the  conditional  distribution 
(see  Sect.  4.9)  of  §(t)  given  that  %(s)  =  x.  By  the  Markov  property,  it  satisfies  the 
relation  (s  <  u  <  t) 


P(s,x;  t,  B) 


J  P(s,x;u,dy)P(u,y;t,  B), 


which  follows  from  the  equality 


(21.1.8) 


P($(r)efl|£(i)  =  *) 

=  E[P(£(0  €  =x\  =  E[P(h,£(h);  t,  B)|£(s)  =x\ 

Equation  (21.1.8)  is  called  the  Chapman-Kolmogorov  equation. 

The  function  P(s,  x;  t,  B)  can  be  used  in  an  analytic  definition  of  a  Markov  pro¬ 
cess.  First  we  need  to  clarify  what  properties  a  function  Px  b  (s,  t)  should  possess  in 
order  that  there  exists  a  Markov  process  §(t)  for  which 

Px,B(s,t)  =  P(s,x ;  t,  B). 

Let  (X,  93  x)  be  a  measurable  space. 


Definition  21.1.4  A  function  Px,b(s,  t)  is  said  to  be  a  transition  function  on 
(X,  93 x)  if  it  satisfies  the  following  conditions: 


(1)  Asa  function  of  B,  Px,b(s,  t)  is  a  probability  distribution  for  each  s  <  t,  x  eX. 

(2)  Px  s(s,t)  is  measurable  in  v  for  each  s  <t  and  B  e  93  x- 

(3)  For  0  <s<u<t  and  all  v  and  B , 


Px,b(s,  t) 


J  Px,dy(Si  u)Py ?jg(w,  t ) 


(the  Chapman-Kolmogorov  equation). 
(4)  Px,B(s,t)  =  Ib(x)  fors  =t. 


Here  properties  (1)  and  (2)  ensure  that  Px^(s,t)  can  be  a  conditional  distribution 
(cf.  Sect.  4.9). 

Now  define,  with  the  help  of  Px,b(s,  t ),  the  finite-dimensional  distributions  of  a 
process  §  (t)  with  the  initial  condition  §(0)  =  a  by  the  formula 

P(§(0  ^  dy\, . . . ,  i;(tn)  G  dyfj 

—  Pa,dy\  (0?  t\)Py\,dy2  (/l  ?  ^2)  ’  ’  ’  Pyn-\,dyn  (in— l  •>  in)-  (21.1.9) 

By  virtue  of  properties  (3)  and  (4),  these  distributions  are  consistent  and  therefore 
by  the  Kolmogorov  theorem  define  a  process  ^(t)  in  (Mr,  93^),  where  T  =  [0,  oo). 
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By  formula  (21.1.9)  and  rule  (21.1.5), 


P(£ fn)  c  Bn  |  (fit\), . . . ,  ^ =  (yi, . . . ,  y^— l)) 

=  Pyn_i,Bn  (fn  —  1  >  tn)  —  P(£ (in)  C  Bn  £ (tn— l)  —  Jn  —  l) 
—  P (jn— !•>  Xi— 1>  Bn). 


We  could  also  verify  this  equality  in  a  more  formal  way  using  the  fact  that  the 
integrals  of  both  sides  over  the  set  e  B  i, . . . ,  §(fw_ 1)  e  Bn-\ }  coincide. 

Thus,  by  virtue  of  Definition  21.1.3,  we  have  constructed  a  Markov  process  § (0 
for  which 


P(s,  x;  t,  B)  =  Px,B(s,  t). 

This  function  will  also  be  called  the  transition  function  (or  transition  probability)  of 
the  process  %(t). 


Definition  21.1.5  A  Markov  process  § ( t )  is  said  to  be  homogeneous  if  P(s,  v;  t,  B), 
as  a  function  of  s  and  t ,  depends  on  the  difference  t  —  s  only: 

P(s,  x;  t,  B)  =  P{t  —  s;  x,  B). 


This  is  the  probability  of  transition  during  a  time  interval  of  length  t  —  s  from  v 
to  B.  If 


P(u ;  t ,  B)  = 


p(u;t,y)dy 


then  the  function  p(u\x,y)  is  said  to  be  a  transition  density. 


It  is  not  hard  to  see  that  the  Wiener  and  Poisson  processes  are  both  homogeneous 
Markov  processes.  For  example,  for  the  Wiener  process, 

P(u;  x,  y)  =  _^e-(*-:v)2/2«. 

\I2txu 


21.2  Markov  Processes  with  Countable  State  Spaces.  Examples 
21.2.1  Basic  Properties  of  the  Process 

Assume  without  loss  of  generality  that  the  “discrete  state  space”  X  coincides  with 
the  set  of  integers  {0,  1,  2, . . .}.  For  simplicity’s  sake  we  will  only  consider  homo¬ 
geneous  Markov  processes. 

The  transition  function  of  such  a  process  is  determined  by  the  collection  of 
functions  P(t\  /,  j)  =  Pij(t)  which  form  a  stochastic  matrix  P(t)  =  \\pij(t)\\  (with 
Pi  jit)  >  0,  JX  Pi  jit)  =  1).  Chapman-Kolmogorov’s  equation  now  takes  the  form 

Pij(t  +  s)  =  Jjpik(t)pkj(s), 

k 


584 


21  Markov  Processes 


or,  which  is  the  same,  in  the  matrix  form, 


P{t  +  s)  =  P(t)P(s)  =  P(s)P(t).  (21.2.1) 


In  what  follows,  we  consider  only  stochastically  continuous  processes  for  which 

p 

%(t  +  s)  — >►  §(7)  as  5  — >  0,  which  is  equivalent  in  the  case  under  consideration  to 
each  of  the  following  three  relations: 

P(§(f  +  s)^$(0) -*•<),  p(t  +  s)^p( t),  P(s)-^P(0)  =  E  (21.2.2) 

as  5  — >  0  (component- wise;  E  is  the  unit  matrix). 

We  will  also  assume  that  convergence  in  (21.2.2)  is  uniform  (for  a  finite  X  this  is 
always  the  case). 

According  to  the  separability  requirement,  we  will  assume  that  §(7)  cannot 
change  its  state  in  “zero  time”  more  than  once  (thus  excluding  the  effects  illus¬ 
trated  in  Example  18.1.1,  i.e.  assuming  that  if  §(7)  =  j  then,  with  probability  1, 
£(*-1-5')  =  j  for  5  e  [0,  r),  r  =  x  (co)  >  0).  In  that  case,  the  trajectories  of  the  pro¬ 
cesses  will  be  piece-wise  constant  (right-continuous  for  definiteness),  i.e.  the  time 
axis  is  divided  into  half-intervals  [0,  ri),  [ti  ,  x\  +  xf), . . . ,  on  which  § ( t )  is  constant. 
Put 


qj{t)  :=P(£(«)  =j,  0  <u  <  1 1£(0)  =  j)  =  P(ri  >  t). 


Theorem  21.2.1  Under  the  above  assumptions  (, stochastic  continuity  and  separa¬ 
bility ), 

qi(t)  =  e~qit , 


where  qi  <  oo;  moreover ;  qi  >  0  if  pn(t)  ^  1.  There  exist  the  limits 


..  i  -  pa(t) 

lim - =  qi , 

t->  o  t 


r  Pi  j  (0  .  ,  . 

0  t 


(21.2.3) 


where  <lij  =  # 


Proof  By  the  Markov  property, 


qi(t  +  s)  =qi(t)qi(s), 

and  qt(t)  Therefore  there  exists  a  unique  solution  qt(t)  =  e~qit  of  this  equation, 
where  qi  <  oo,  since  P(ti  >  0)  =  1  and  qi  >  0,  because  qi(t)  <  1  when  pu(t)  ^  1. 
Let  further  0  <  to  <  h  ■  ■  ■  <  tn  <  t.  Since  the  events 

{%(u)=i  for  u  <tr,%(tr+i)  =  j}>  r  =  0, ...  ,n  —  1;  j^i, 

are  disjoint, 


n— 1 

Pii  {t)  =  tfi  it)  T  ^  ^  '  qi  iU)Pij  (^r+1  E) P ji  f  (21.2.4) 

r=0j:j^i 


Here,  by  condition  (21.2.2),  Pjiit  —  tr+ 1)  <  et  for  all  j  ^  i ,  and  e*  — >  0  as  t  — >►  0, 
so  that  the  sum  in  (21.2.4)  does  not  exceed 
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n— 1  In  \ 

£r  X!  IZ  ?i(^)P0'(^+l  -?r)  =  e«P|  //}|f(0)  =  i  I  <  e,(l  -qi(t)), 

r=0j:j^i  \r= 1  / 

Pa(t)  <qi(t)  +  et(  1  -qi(t)). 

Together  with  the  obvious  inequality  /?;;(/)  >  gz(0  this  gives 

1  -?j(0  >  1  -  Pii(t )  >  (l  -?i(0)(l  +  £‘( ) 

(i.e.  the  asymptotic  behaviour  of  1  —  qi(t)  and  1  —  pa(t)  as  t  — >  oo  is  identical). 
This  implies  the  second  assertion  of  the  theorem  (i.e.,  the  first  relation  in  (21.2.3)). 
Now  let  tr  :=rt  jn.  Consider  the  transition  probabilities 


n— 1 


Pijit )  >  'Y^qi{tr)pij(t / n)qj{t  -  tr+\) 


r= 0 


It  ~  1 


>  (1  ~ e?) Pij(t/n)'Y^e  q‘rt,n  >  (1  —  et)pij(t / n) 


r=0 


(1  —  e  qit)n 
qtt 


This  implies  that 


Pijit)  >  (1  -et) 


1  —  e  qit 
Vi 


v  Pij^) 
lim  sup  — - - 

<5 — >  0  t 


and  that  the  upper  limit  on  the  right-hand  side  is  bounded.  Passing  to  the  limit  as 
t  ->  0,  we  obtain 

r  •  e  Pijit)  Pij(8 ) 

lim  mr  — - —  >  lim  sup  — - - . 

t  <$->0  ^ 


Since  Pijit)  =  1  -  PaiO,  we  have  Xj-.j^iQij  =  Vi-  The  theorem  is 

proved.  □ 


The  theorem  shows  that  the  quantities 

Pi  j  =  — ,  Pii  =  o 

q> 

form  a  stochastic  matrix  and  give  the  probabilities  of  transition  from  i  to  j  during 
an  infinitesimal  time  interval  A  given  the  process  §(t)  left  the  state  i  during  that 
time  interval: 


P(s (t  +  A)  =  j  I m  =  i ,  Ht  +  A)^  i)  = 

1  -  PiiiA) 


Qij 

<li 


as  A  — >►  0. 

Thus  the  evolution  of  §(0  can  be  thought  of  as  follows.  If  §(0)  =  Xo,  then  §(£) 
stays  at  Xo  for  a  random  time  x\  T qx  .  Then  §  it)  passes  to  a  state  X\  with  prob¬ 
ability  px0x i-  Further,  tj(t)  =  X\  over  the  time  interval  [ri,  x\  +  X2 ),  X2  ^  T qx  , 
after  which  the  system  changes  its  state  to  X2  and  so  on.  It  is  clear  that  Xo,  X\ , . . . 
is  a  homogeneous  Markov  chain  with  the  transition  matrix  ||  pij  || .  Therefore  the 
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further  study  of  £(0  can  be  reduced  in  many  aspects  to  that  of  the  Markov  chain 
{ Xn ;  n  >  0},  which  was  carried  out  in  detail  in  Chap.  13. 

We  see  that  the  evolution  of  ^  it)  is  completely  specified  by  the  quantities  qij  and 
qi  forming  the  matrix 

Pit)  -  P{ 0) 

Q  =  \Wij  II  =  lim  t  ,  (21.2.5) 

where  we  put  qa  :=  — qi ,  so  that  JE  qij  =  0.  We  can  also  justify  this  claim  using 
an  analytical  approach.  To  simplify  the  technical  side  of  the  exposition,  we  will 
assume,  where  it  is  needed,  that  the  entries  of  the  matrix  Q  are  bounded  and  con¬ 
vergence  in  (21.2.3)  is  uniform  in  i. 

Denote  by  eA  the  matrix- valued  function 


Theorem  21.2.2  The  transition  probabilities  pijit)  satisfy  the  systems  of  differen¬ 
tial  equations 

P\t)  =  Pit)Q ,  (21.2.6) 

P\t)  =  QPit).  (21.2.7) 

Each  of  the  systems  (21.2.6)  and  (21.2.7)  has  a  unique  solution 

P(t)  =  eQt. 


It  is  clear  that  the  solution  can  be  obtained  immediately  by  formally  integrating 
equation  (21.2.6). 


Proof  By  virtue  of  (21.2.1),  (21.2.2)  and  (21.2.5). 


,  P(t+s)-P(t)  Pis)  —  E 

P'it)  =  lim  — - 2 - =  lim  Pf)—d - =  P(t)Q.  (21.2.8) 

.s-^O  S  .s-^0  S 


In  the  same  way  we  obtain,  from  the  equality 

P(t  +  s)~  P(t)  =  (P(s)~  E)P(t), 

the  second  equation  in  (21.2.7).  The  passage  to  the  limit  is  justified  by  the  assump¬ 
tions  we  made. 

Further,  it  follows  from  (21.2.6)  that  the  function  Pit)  is  infinitely  differentiable, 
and 


P(k)  (t)  =  P  (0  Qk , 

00  Z 

no-no)  =  y>w(o)-  = 

k= 1 

P(t)  =  P(  0)eQt. 


oo 


E 


The  theorem  is  proved. 


□ 
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Because  of  the  derivation  method,  (21.2.6)  is  called  the  backward  Kolmogorov 
equation ,  and  (21.2.7)  is  known  as  the  forward  Kolmogorov  equation  (the  time  in¬ 
crement  is  taken  after  or  before  the  basic  time  interval). 

The  difference  between  these  equations  becomes  even  more  graphical  in  the  case 
of  inhomogeneous  Markov  processes,  when  the  transition  probabilities 

P(£(0  =  j\HV  =  i)  =  Pij(s,t),  s  <t, 


depend  on  two  time  arguments:  s  and  t.  In  that  case,  (21.2.1)  becomes  the  equality 
P(s,  t  +  u)  =  P(s,  t)P(t,  t  +  u),  and  the  backward  and  forward  equations  have  the 
form 


dP(s,t) 

3s 

respectively,  where 


P(s,t)Q(s), 


3  P(s,t) 

3 7 


Q(t)P(s,t), 


Q(t )  =  lim 

u  — >•  0 


P {t ,  t  T"  w)  —  E 
u 


The  reader  can  derive  these  relations  independently. 

What  are  the  general  conditions  for  existence  of  a  stationary  limiting  distribu¬ 
tion?  We  can  use  here  an  approach  similar  to  that  employed  in  Chap.  13. 

Let  ^l\t)  be  a  process  with  the  initial  value  §^(0)  =  i  and  right-continuous 
trajectories.  For  a  given  io,  put 


v^  :=  min [t  >  0  :  ^l\t)  =  z’o}  =:  vo, 

Vk  :=  min [t  >  Vk-\  +  1  :  ^l\t)  =  z’o}>  k=  1,2, _ 

Here  in  the  second  formula  we  consider  the  values  t  >  Vk- 1  +  1,  since  for  t  >  Vk- 1 
we  would  have  Vk  =  Vk- 1-  Clearly,  P(v^  —  Vk-\  =  1)  >0,  and  P(v£  —  Vk-\  G 
(t,  t  +  h))  >  0  for  any  t  >  1  and  h  >  0  provided  that  Pi0i0(t)  =£  1. 

Note  also  that  the  variables  v&,  k  =  0, 1, . . . ,  are  not  defined  for  all  elementary 
outcomes.  We  put  vo  =  oo  if  ^l\t)  io  for  all  t  >  0.  A  similar  convention  is  used 
for  k  >  1.  The  following  ergodic  theorem  holds. 


Theorem  21.2.3  Let  there  exist  a  state  io  such  that  Evi  <  oo  and  P(v^  <  oo)  =  1 
for  all  i  G  Xo  C  X.  Then  there  exist  the  limits 

lim  pi j(t)  =  pj  (21.2.9) 

t^oo 

which  are  independent  of  i  e  Xq. 


Proof  As  was  the  case  for  Markov  chains,  the  epochs  v\ ,  V2, . . .  divide  the  time  axis 
into  independent  cycles  of  the  same  nature,  each  of  them  being  completed  when 
the  system  returns  for  the  first  time  (after  one  time  unit)  to  the  state  io.  Consider 
the  renewal  process  generated  by  the  sums  Vk,  k  =  0, 1, . . . ,  of  independent  random 
variables  vo,  Vk  —  Vk-u  k=  1,2,....  Let 

oo 

rj(t)  :=  min{fc  :  vk  >  t),  y(t)  :=  t  -  H(t)  :=  <  t). 

k=  0 
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The  event  A^v  :=  {y(t)  e  [v,  v  +  dv)}  can  be  represented  as  the  intersection  of  the 
events 

Bdv  [J{vjfc  €  (t  -  v -dv,t  -  u]}  6  3l-„ 
k>0 


and  Cv  :=  {§( u )  ^  /q  for  u  e  [t  —  v  +  1,  t]}  e  d[t-v,o o)-  We  have 


A7(0=  fp(?(i)(0  =  j,K(f)e[»,H*))=  f  Y($(i)=j,  BdvCv) 
Jo  Jo 

Jo 

=  f  E[lBdvP(^Ht)=j,Cv\^(t-v))]. 

Jo 


On  the  set  Bdv ,  one  has  §(f  —  n)  =  z'o>  and  hence  the  probability  inside  the  last 
integral  is  equal  to 

p($(i0)(w)  =  j,  Hu)  ¥>  to  for  U  e  [1,  t>])  =:  g(v) 
and  is  independent  of  t  and  i.  Since  P {Bdv)  =  dH(t  —  v),  one  has 


Pi  jit)  =  f  g(v)¥(Bdv)=  f  g(v)dH(t-v). 

Jo  Jo 

By  the  key  renewal  theorem,  as  t  ->  oo,  this  integral  converges  to 

1 

/  g(v)dv. 

Evi  Jo 

The  existence  of  the  last  integral  follows  from  the  inequality  g(v)  <  P(V1  >  v).  The 
theorem  is  proved.  □ 


Theorem  21.2.4  If  the  stationary  distribution 

P  ~  lim  P(t) 

t  — >  oo 

exists  with  all  the  rows  of  the  matrix  P  being  identical ,  then  it  is  a  unique  solution 
of  the  equation 

PQ  =  0.  (21.2.10) 

It  is  evident  that  Eq.  (21.2.10)  is  obtained  by  setting  P'(t)  =  0  in  (21.2.6).  Equa¬ 
tion  (21.2.7)  gives  the  trivial  equality  QP  =  0. 


Proof  Equation  (21.2.10)  is  obtained  by  passing  to  the  limit  in  (21.2.8)  first  as 
t  — >►  oo  and  then  as  s  — >►  0.  Now  assume  that  P\  is  a  solution  of  (21.2.10),  i.e. 
P\Q  =  0.  Then  P\P(t)  =  P\  for  t  <  1,  since 

00  Oktk 

Pi  (P(t)  -  P( 0))  =  PiJ2  =  °- 

k=  1 
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Further,  P\  =  P\Pk(t)  =  P\P(kt ),  P(kt)  — >  P  as  A  — >  oo,  and  hence  Pi  = 
P\P  =  P .  The  theorem  is  proved.  □ 


Now  consider  a  Markov  chain  {Xn}  in  discrete  time  with  transition  probabil¬ 
ities  pij  =  qij/qt ,  i  ^  j,  pa  =  0.  Suppose  that  this  chain  is  ergodic  (see  Theo¬ 
rem  13.4.1).  Then  its  stationary  probabilities  {71/ }  satisfy  Eqs.  (13.4.2).  Now  note 
that  Eq.  (21.2.10)  can  be  written  in  the  form 

Pi  <U  =  J2PkqkPkJ 

k 


which  has  an  obvious  solution  pj  =  citj /qj ,  c  =  const.  Therefore,  if 


4j 


<  oo 


then  there  exists  a  solution  to  (21.2.10)  given  by 


(21.2.11) 


(21.2.12) 


In  Sects.  21.4  and  21.5  we  will  derive  the  ergodic  theorem  for  processes  of  a  more 
general  form  than  the  one  in  the  present  section.  That  theorem  will  imply,  in  partic¬ 
ular,  that  ergodicity  of  {Xn}  and  convergence  (21.2.11)  imply  (21.2.9).  Recall  that, 
for  ergodicity  of  {Xn},  it  suffices,  in  turn,  that  Eqs.  (13.4.2)  have  a  solution  {nj}. 
Thus  the  existence  of  solution  (21.2.12)  implies  the  ergodicity  of  §( t ). 


21.2.2  Examples 


Example  21.2.1  The  Poisson  process  § ( t )  with  parameter  A  is  a  Markov  process  for 
which  qi  =  A,  qtj+i  =  A,  and  pij+ 1  =  1,  /  =  1,  0, _ For  this  process,  the  station¬ 

ary  distribution  p  =  (po,  p\, . . .)  does  not  exist  (each  trajectory  goes  to  infinity). 

Example  21.2.2  Birth-and-death  processes.  These  are  processes  for  which,  for 

i>  1, 


so  that 


Pij  = 


A  i  ~\~  o 

(A) 

for  j  ~  i 

+  1, 

PiA  +  o 

(A) 

for  j  ~  i 

-i. 

o{A) 

for  | j  -i 

1  >2 

A- 

A./  +M/ 

for 

j  =1  +  1  > 

for 

j  =  i  ~  1 

are  probabilities  of  birth  and  death,  respectively,  of  a  particle  in  a  certain  population 
given  that  the  population  consisted  of  i  particles  and  changed  its  composition.  For 
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i  =  0  one  should  put  /xo  :=  0.  Establishing  conditions  for  the  existence  of  a  station¬ 
ary  regime  is  a  rather  difficult  problem  (related  mainly  to  finding  conditions  under 
which  the  trajectory  escapes  to  infinity).  If  the  stationary  regime  exists,  then  accord¬ 
ing  to  Theorem  21.2.4  the  stationary  probabilities  pj  can  be  uniquely  determined 
from  the  recursive  relations  (see  Eq.  (21.2.10),  in  our  case  qa  =  —qt  =  —  (A;  +  /x;)) 


-po^o  +  MiMi  =0, 

PO^O  ~  Plfrl  +  Ml)  +  PlPl  =  0, 


Pk-l^k-l  ~  Pki^k  +  Pk)  +  Pk+\Pk+\  =  0, 


(21.2.13) 


and  condition  J2  Pj  =  1  • 


Example  21.2.3  The  telephone  lines  problem  from  queueing  theory.  Suppose  we 
are  given  a  system  consisting  of  infinitely  many  communication  channels  which 
are  used  for  telephone  conversations.  The  probability  that,  for  a  busy  channel,  the 
transmitted  conversation  terminates  during  a  small  time  interval  (t,  t  +  A)  is  equal 
to  XA  +  o(A).  The  probability  that  a  request  for  a  new  conversation  (a  new  call) 
arrives  during  the  same  time  interval  is  pA  +  o(A).  Thus  the  “arrival  flow”  of  calls 
is  nothing  else  but  the  Poisson  process  with  parameter  A,  and  the  number  §(t)  of 
busy  channels  at  time  t  is  the  value  of  the  birth-and-death  process  for  which  A ;  =  A 
and  /x/  =  ip,. 

In  that  case,  it  is  not  hard  to  verify  with  the  help  of  Theorem  21.2.3  that  there 
always  exists  a  stationary  limiting  distribution,  for  which  Eqs.  (21.2.13)  have  the 
form 

Xpo  =  ppi, 


(A  +  pk)pk  =  Xpk-i  +  (k  +  1)/XM£+1> 


(21.2.14) 


From  this  we  get  that 

A 

Pi  =  P0-, 

p 


(21.2.15) 


so  that  po  =  e~x^,  and  the  limiting  distribution  will  be  the  Poisson  law  with  pa¬ 
rameter  A/  /x. 

If  the  number  of  channels  n  is  finite,  the  calls  which  find  all  the  lines  busy  will 
be  rejected,  and  in  (21.2.13)  one  has  to  put  Xn  =  0,  pn+\  =  Pn+2  =  •  •  •  =  0.  In 
that  case,  the  last  equation  in  (21.2.14)  will  have  the  form  pnpn  =  Xpn-\.  Since 
the  formulas  (21.2.15)  will  remain  true  for  k  <  n,  we  obtain  the  so-called  Erlang 
formulas  for  the  stationary  distribution: 


Pk  = 


j 


-l 


(the  truncated  Poisson  distribution). 
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The  next  example  will  be  considered  in  a  separate  section. 


21.3  Branching  Processes 


The  essence  of  the  mathematical  model  describing  a  branching  process  remains 
roughly  the  same  as  in  Sect.  7.7.2.  A  continuous  time  branching  process  can  be 
defined  as  follows.  Let  ^l\t)  denote  the  number  of  particles  at  time  t  with  the 
initial  condition  §^(0)  =  i.  Each  particle,  independently  of  all  others,  splits  during 
the  time  interval  (t,  t  +  A)  with  probability  pA  +  o(A)  into  a  random  number  r/  /  1 
of  particles  (if  17  =  0,  we  say  that  the  particle  dies).  Thus, 

£(°(0  =  £i(1)(0H - 1-^{1)(0,  (21.3.1) 

where  \t)  are  independent  and  distributed  as  £ ( 1 1  (7 ) .  Moreover, 

Pij(A)  =  iiiAhj-i+i+o(A),  j^i-  hk  =  P(rj  =  k)\  h\=0\ 

Pa(A)  =  1  —  i/iA  +  o(A),  (21.3.2) 

so  that  here  qij  =  i  ph j-i+i,  qn  =  —ip. 

By  formula  (21.3.2),  ip  A  is  the  principal  part  of  the  probability  that  at  least 
one  particle  will  split.  Clearly,  the  state  0  is  absorbing.  It  will  not  be  absorbing  any 
more  if  one  considers  processes  with  immigration  when  a  Poisson  process  (with 
intensity  A)  of  “outside”  particles  is  added  to  the  process  Then 


Pij(A)  =  ipAhj-i+\  +o(A)  for  j  -  i^  0,  1, 
Pij+i(A)  =  A(iph2  +  A)  +  o(A). 


We  return  to  the  branching  process  (21.3.1),  (21.3.2).  By  (21.3.1)  we  have 

oo 

r(,)(t,z )  :=E z^‘)(t)  =  [Ez?<1)(0]'  =rl(t,z)  =  ^2zkpa(t), 

k= o 


where 

oo 

r(t,  z)  :=  E z^){t}  =  0-  (21.3.3) 

k=0 


Equation  (21.2.7)  implies 

oo 

p'ik(  0  =  ^2,qupik(t). 
1=0 


Therefore,  differentiating  (21.3.3)  with  respect  to  t,  we  find  that 

oo  oo  oo 

r't(t,z)  =  J2zkp[k(t)  =  'YA2^q\iPik(t)zk 

k= 0  k= 0 1=0 

OO  OO  oo 

=  ^2(lll^2zkPlk(t)  =  ^2qurl(t,  Z). 
1=0  k= 0  1=0 


(21.3.4) 
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Fig.  21.1  The  form  of  the 
plot  of  the  function  f\ .  The 
smaller  root  of  the  equation 
f\(q)  =q  gives  the 
probability  of  the  eventual 
extinction  of  the  branching 
process 


But  qu  =  iipi  for  l  ^  1,  q\\  =  — /z,  and  putting 


oo 


oo 


f(s)  :=  'Y^qns1  =  m(E sv  -s)  =  /r(  ^ his 1 
1=0  \  1=0 

we  can  write  (21.3.4)  in  the  form 

r'(t,z)  =  f(r(t,zj). 

We  have  obtained  a  differential  equation  for  r  =  r(t,z)  (equivalent  to  (21.3.2)) 
which  is  more  convenient  to  write  in  the  form 


dr 
fir ) 


=  dt, 


t  = 


/ 


r{t,z) 


dy_ 

«u)  fiy) 


l 


r(t,z) 


<p_ 

fiy) 


Consider  the  behaviour  of  the  function  f\iy)  =  Ev,(  —  v  on  [0, 1],  Clearly, 
/t  (0)  =  P(>?  =  0),  /i  (1)  =  0,  and 

/l (1)  =  E/7  -  1,  fl'(y)=Er1(r1  -  1) y""2  >  0. 

Consequently,  the  function  f\(y)  is  convex  and  has  no  zeros  in  (0,  1)  if  Eq  <  1. 
When  Eq  >  1,  there  exists  a  point  q  e  (0,  1)  such  that  f\(q)  =  0,  f[(q)  <  0  (see 
Fig.  21.1),  and  f\(y)  =  (y  —  q)f[(q)  +  0((y  —  q)2)  in  the  vicinity  of  this  point. 
Thus  if  Er/  >  1,  z  <  q  and  r  \  q,  then,  by  virtue  of  the  representation 

1  1 

+  0(1), 


My)  ( y-q)f\(q ) 


we  obtain 


-I 


dy 


1 


In  I 


r  —  q 
Z~q 


+  0(1). 


(21.3.5) 


fiy)  nffq) 

This  implies  that,  as  t  — >►  oo, 

r(t,  z)  —  q  =  (z  —  q)e^tf^+0^V}  ~  (z  —  q)e^^q\ 
r(t,z)  =q  +  0(e~at),  a  =  -jif[(q)  >  0. 

In  particular,  the  extinction  probability 

P\oit)  =  rit,  0  )=q  +  0(e~c‘r) 

converges  exponentially  fast  to  q ,  p  10(00)  =  q.  Comparing  our  results  with  those 
from  Sect.  7.7,  the  reader  can  see  that  the  extinction  probability  for  a  discrete  time 
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branching  process  had  the  same  value  (we  could  also  come  to  this  conclusion  di- 
rectly).  Since  pko(t)  =  [p\o(t)]k,  one  has  Pko(oo)  =  qk ■ 

It  follows  from  (21.3.5)  that  the  remaining  “probability  mass”  of  the  distribution 
of  §(0  quickly  moves  to  infinity  as  t  — >►  oo. 

If  E  rj  <  1,  the  above  argument  remains  valid  with  q  replaced  with  1,  so  that  the 
extinction  probability  is  j9io(oo)  =  pk o(oo)  =  1. 

If  E77  =  1,  then 


My)  =  ^r^/ro)  +  0((y  -  l)3), 


'=1 


dy 

f(y ) 


1 


M/l"(  1)  r-1 


r(t ,  z)  —  1  ~  - 


^/i"(  1) 


Thus  the  extinction  probability  r(t,  0)  =  /?io(0  also  tends  to  1  in  this  case. 


21.4  Semi-Markov  Processes 

21.4.1  Semi-Markov  Processes  on  the  States  of  a  Chain 

Semi-Markov  processes  can  be  described  as  follows.  Let  an  aperiodic  discrete  time 
irreducible  Markov  chain  {Xn}  with  the  state  space  X  =  {0,  1,  2, . . .}  be  given.  To 
each  state  i  we  put  into  correspondence  the  distribution  Fi(t)  of  a  positive  random 
variable  f 

F/(O  =  P(f(°<0- 

Consider  independent  of  the  chain  {Xn}  and  of  each  other  the  sequences 

. . . ;  f  of  independent  random  variables  with  the  distribution  Fj.  Let, 

moreover,  the  distribution  of  the  initial  random  vector  (Xo,  fo)>  ^0  £  X,  £0  >  0,  be 
given.  The  evolution  of  the  semi-Markov  process  §(w)  is  described  as  follows: 

%(u)  =  X  0  for0<w<£o, 

$(u)  =  X  1  for  ^0  5  m  <?o  +  ?i(Xl), 

£(m)  =  X2  forf0  +  4Xl)^M<f0  +  4Xl)+#2)’  (21A1) 

§(w)  =  for  Zn_i  <u<Zn ,  Zft  =  £0  +  ^  +  •  •  •  +  %nXn\ 

and  so  on.  Thus,  upon  entering  state  Xn  =  j,  the  trajectory  of  §(w)  remains  in  that 

state  for  a  random  time  then  switches  to  state  Xn+\  and  so  on.  It 

is  evident  that  such  a  process  is,  generally  speaking,  not  Markovian.  It  will  be  a 
Markov  process  only  if 

1  -Fi(t)  =  e-*t,  qi>  0, 

and  will  then  coincide  with  the  process  described  in  Sect.  21.2. 
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Fig.  21.2  The  trajectories  of  the  semi-Markov  process  f  (f)  and  of  the  residual  sojourn  time  pro¬ 
cess  x(0 


If  the  distribution  F[  is  not  exponential,  then,  given  the  value  %(t)  =  i,  the  time 
between  t  and  the  next  jump  epoch  will  depend  on  the  epoch  of  the  preceding  jump 
of  §(•),  because 


P(£(0  >  v  +  u\t;(i)  >  v) 


1  -  F,(v  +  u ) 
1  -  F,(v) 


for  non-exponential  Fi  depends  on  v.  It  is  this  property  that  means  that  the  process 
is  non-Markovian,  for  fixing  the  “present”  (i.e.  the  value  of  §(0)  does  not  make  the 
“future”  of  the  process  §(w)  independent  of  the  “past”  (i.e.  of  the  trajectory  of  §(w) 
for  u  <  t ). 

The  process  §(f)  can  be  “complemented”  to  a  Markov  one  by  adding  to  it  the 
component  x  0 )  of  which  the  value  gives  the  time  u  for  which  the  trajectory  %(t  +  w), 
u  >  0,  will  remain  in  the  current  state  §(0-  In  other  words,  x(0  is  the  excess  of 
level  t  for  the  random  walk  Zq,  Z\ , . . .  (see  Fig.  21.2): 


X(t)  =  Zv(t)+ 1  -t,  v(t)=m<ix{k:Zk<t}. 

The  process  x(0  is  Markovian  and  has  “saw-like”  trajectories  deterministic  in¬ 
side  the  intervals  (Z&,  Z^+ 1).  The  process  Z(f)  =  (§(£),  x(0)  is  obviously  Marko¬ 
vian,  since  the  value  of  X  (t )  uniquely  determines  the  law  of  evolution  of  the  process 
X(t  +  u)  for  u  >  0  whatever  the  “history”  X(v),  v  <  t,  is.  Similarly,  we  could  con¬ 
sider  the  Markov  process  Y ( t )  =  (§ ( t ),  ]/(f)),  where  y(f)  is  the  defect  of  level  t  for 
the  walk  Zq,  Z i,  . . . : 


y  (0  —  ^  ^v(t)  • 


21.4.2  The  Ergodic  Theorem 

In  the  sequel,  we  will  distinguish  between  the  following  two  cases. 
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(A)  The  arithmetic  case  when  the  possible  values  of  ^l\  i  =0,  1, . . are  mul¬ 
tiples  of  a  certain  value  h  which  can  be  assumed  without  loss  of  generality  to  be 
equal  to  1.  In  that  case  we  will  also  assume  that  the  g.c.d.  of  the  possible  values  of 
the  sums  of  the  variables  £ )  is  also  equal  to  h  =  1 .  This  is  clearly  equivalent  to 
assuming  that  the  g.c.d.  of  the  possible  values  of  recurrence  times  6^  of  § ( t )  to  the 
state  i  is  equal  to  1  for  any  fixed  i . 

(NA)  The  non- arithmetic  case,  when  condition  (A)  does  not  hold. 

Put  at  :=  Ef 


Theorem  21.4.1  Let  the  Markov  chain  {Xn}  he  ergodic  ( satisfy  the  conditions  of 
Theorem  13.4.1)  and  {ttj}  be  the  stationary  distribution  of  that  chain.  Then ,  in  the 
non- arithmetic  case  (NA ),for  any  initial  distribution  (fo,  Xq)  there  exists  the  limit 


Tt-  r°° 

lim  P(£(0  =  z\  x(t)  >  v)  =  =-l —  /  PU(,)  >u)du.  (21.4.2) 

r^°°  z2jtjaj  Jv 

In  the  arithmetic  case  (A),  (21.4.2)  holds  for  integer-valued  v  {the  integral  be¬ 
comes  a  sum  in  that  case).  It  follows  from  (21.4.2)  that  the  following  limit  exists 

it  i  at 


lim  P(£(*)  =  0  = 

t  — >-  oo  v  7 


Proof  For  definiteness  we  restrict  ourselves  to  the  non- arithmetic  case  (NA).  In 
Sect.  13.4  we  considered  the  times  x^l)  between  consecutive  visits  of  {Xn}  to  state  i. 
These  times  could  be  called  “embedded”,  as  well  as  the  chain  { Xn }  itself  in  regard 
to  the  process  §(t).  Along  with  the  times  x^l\  we  will  need  the  “real”  times 
between  the  visits  of  the  process  §(t)  to  the  state  i.  Let,  for  instance,  X\  =  1.  Then 

where  r  =  x^K  For  definiteness  and  to  reduce  notation,  we  fix  for  the  moment  the 
value  i  =  1  and  put  0^  =:  0.  Let  first 

<To  =  ?(1),  X0=l.  (21-4.3) 

Then  the  whole  trajectory  of  the  process  X  (t)  for  t  >0  will  be  divided  into  iden¬ 
tically  distributed  independent  cycles  by  the  epochs  when  the  process  hits  the  state 
£(f)  =  1.  We  denote  the  lengths  of  these  cycles  by  6\,  62 . . . ;  they  are  independent 
and  identically  distributed.  We  show  that 

1  v- 

E  0  =  —  V  ajTTj.  (21.4.4) 

Denote  by  0{n)  the  “real”  time  spent  on  n  transitions  of  the  governing 
chain  {Xn}.  Then 

0l  +  •  •  •  +  0rj(n)- 1  <  @(n)  <  0\  +  •  •  •  +  Orj^n),  (21.4.5) 

where  r/(n)  :=  min{/:  :  7^  >  n},  7^  =  Y^j=\  xh  xj  are  independent  and  distributed 
as  x.  We  prove  that,  as  n  ->  oo, 


E 6{n)  ~  n7X\E0. 


(21.4.6) 
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By  Wald’s  identity  and  (21.4.5), 

JL6(n)  <  JL6JLrj(n),  (21.4.7) 

where  Eij  (n)  ~  n /Er  =  mx i . 

Now  we  bound  from  below  the  expectation  E 0(n).  Put  m  :=  \nn\  —  en\ ,  0n  •= 

EUeJ-  Then 

E0R)  >  E (6(n);  r]{n)  >  m) 

>  E((9m;  r){n)  >  m)  —  mJLO  —  E((9m;  r){n)  <  m).  (21.4.8) 

Here  the  random  variable  (9m/m  >  0  possesses  the  properties 

&m/m\jL6  asm  ^  oo,  E((9m/ra)=E0. 

Therefore  it  satisfies  the  conditions  of  part  4  of  Lemma  6.1.1  and  is  uniformly  in¬ 
tegrate.  This,  in  turn,  by  Lemma  6.1.2  and  convergence  <  m)  — >  0  means 

that  the  last  term  on  the  right-hand  side  of  (21.4.8)  is  o(m).  By  virtue  of  (21.4.8), 
since  s  >  0  is  arbitrary,  we  obtain  that 

\iminfn~[EO(n)  >  niEO. 

n^oo 

This  together  with  (21.4.7)  proves  (21.4.6). 

Now  we  will  calculate  the  value  of  E 0(n)  using  another  approach.  The  variable 
0(n)  admits  the  representation 

0(«)  =  E(4;)+---+4o»)’ 

j 

where  N(j,n )  is  the  number  of  visits  of  the  trajectory  of  {Xk}  to  the  state  j  during 
the  first  n  steps.  Since  and  N(j,  n)  are  independent  for  each  j,  we  have 

n 

E  0(n)  =  y^ajEN(J,n),  EN(j,n)  =  i(k)- 

j  k= 1 

Because  p\j{k)  — >  i Xj  as  k  — >►  oo,  one  has 

lim  ft-1EA/Xj,  ft)  =  7i j. 

n^oo  J 

Moreover, 

Ttj  =  J2*lPlj(k)  >nipij(k) 

and,  therefore, 

P\j{k)  <  Tt  j  /  Tt  l . 


Hence 


n  lEN(j ,  n)  <  tt  j /Tt\, 
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and  in  the  case  when  ajTtj  <  00 » the  series  ■  ajn  lEN(j,  n)  converges  uni¬ 
formly  in  n.  Consequently,  the  following  limit  exists 

lim  n~lE0(n)  ajTtj. 

n—>oo  J  J 

j 

Comparing  this  with  (21.4.6)  we  obtain  (21.4.4).  If  EO  =  oo  then  clearly 
E 0(n)  =  oo  and  ajTtj  =  00 >  and  vice  versa,  if  ■  ajTtj  =  oo  then  E0  =  oo. 

Consider  now  the  random  walk  {0  k}-  To  the  k-th  cycle  there  correspond  7&  tran¬ 
sitions.  Therefore,  by  the  total  probability  formula, 

00  pt 

P (£(0  =  1,  x(0  >  v)  =  ^2  /  p(^  G  du ’  fr*+i  >  *  ~  u  + 

*=i 

where  is  independent  of  0k  and  distributed  as  (see  Lemma  11.2.1  or 

the  strong  Markov  property).  Therefore,  denoting  by  Hq(u)  :=  J2kL\  <  u) 
the  renewal  function  for  the  sequence  {0k},  we  obtain  for  the  non- arithmetic  case 
(NA),  by  virtue  of  the  renewal  theorem  (see  Theorem  10.4.1  and  (10.4.2)),  that,  as 
t  — >  oo, 


p($(0  =  i,x(0>v) 

=  j  dHo(u)  P(f^  >  t  —  u  +  u) 

7o 

j  p  OO  j  z1  OO 

—  /  P(f(1)  >m  +  u) dv= —  /  P(f(1)>MW  (21.4.9) 

E0  Jo  V  }  E0  Jv  vs  ' 


We  have  proved  assertion  (21.4.2)  for  /  =  1  and  initial  conditions  (21.4.3).  The 
transition  to  arbitrary  initial  conditions  is  quite  obvious  and  is  done  in  exactly  the 
same  way  as  in  the  proof  of  the  ergodic  theorems  of  Chap.  13. 

if£  ajTtj  =  oo  then,  as  we  have  already  observed,  EO  =  oo  and,  by  the  renewal 
theorem  and  (21.4.9),  one  has  P(§(t)  =  1,  x(0  >  v)  0  as  t  — >  oo.  It  remains  to 
note  that  instead  of  i  =  1  we  can  fix  any  other  value  of  i .  The  theorem  is  proved.  □ 


In  the  same  way  we  could  also  prove  that 


7T 

lim  P(£ (t)  =  i,  y(t)  >  v)  =  =— ^ —  /  P(f(,)  >  y)  dy, 

f^oo  L ajTtj  Jv 

lim  P(£(f)  =  i,  x(0  >  m,  y(t)  >  v)  = 


Jv 


t — oo 

(see  Theorem  10.4.3). 


J2ajjz 


poo 

;/  P  (t;(i)>y)dy 
i  Ju+v 


JnJ  Ju+v 


21.4.3  Semi-Markov  Processes  on  Chain  Transitions 

Along  with  the  semi-Markov  processes  §(t)  described  at  the  beginning  of  the 
present  section,  one  sometimes  considers  semi-Markov  processes  “given  on  the 
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transitions”  of  the  chain  {Xn}.  In  that  case,  the  distributions  F[j  of  random  variables 
£-9/)  >  o  are  given  and,  similarly  to  (21.4.1),  for  the  initial  condition  (Xq,  X\,  £o) 
one  puts 


$(u):=(X0,Xi)  for  0  <  w  <  £o 

Hu):={X UX2)  for  ?o  <  m  <  +  4X°’Xl)  (21.4.10) 

$(«)  ■■=  (X2,  X3)  for  +  fg0’*0  <u<So  +  slXo’Xl)  +  ^XuX2\ 

and  so  on.  Although  at  first  glance  this  is  a  very  general  model,  it  can  be  com¬ 
pletely  reduced  to  the  semi-Markov  processes  (21 .4. 1).  To  that  end,  one  has  to  notice 
that  the  “two-dimensional”  sequence  Yn  =  (Xn,  Xw+i),  n  =  0,  1, ... ,  also  forms  a 
Markov  chain.  Its  transition  probabilities  have  the  form 


P(ij)(kl)  —  ' 


pjl  for  k  =  j, 

0  for  k  /  j, 

P(ijW)(n )  =  Pjk(n)pkt  for  «  >  1 , 
so  that  if  the  chain  {Xn}  is  ergodic,  then  {Yn}  is  also  ergodic  and 


P{ij){kl)(n)  >  'RkPkl- 

This  enables  one  to  restate  Theorem  21.4.1  easily  for  the  semi-Markov  pro¬ 
cesses  (21.4.10)  given  on  the  transitions  of  the  Markov  chain  {Xn},  since  the  process 
(21.4.10)  will  be  an  ordinary  semi-Markov  process  given  on  the  chain  {Yn}. 


Corollary  21.4.1  If  the  chain  { Xn }  is  ergodic  then ,  in  the  non- arithmetic  case , 

lim  P(£(0  =  ( i ,  j ),  X(0  >  v) 

=  n1pLL -  f°°p^(ij)  >u)du,  aki=Ei;(kl). 

/  ,ir  J  dklTCkPkl  Jv 

In  the  arithmetic  case  v  must  be  a  multiple  of  the  lattice  span. 


We  will  make  one  more  remark  which  could  be  helpful  when  studying  semi- 
Markov  processes  and  which  concerns  the  so-called  semi-Markov  renewal  functions 
Hij(t).  Denote  by  fj(n)  the  epoch  (in  the  “real  time”)  of  the  n- th  jump  of  the 
process  §(t)  from  state  i  to  j.  Put 


OO 


Ht](t)  :=y>(7 }j(n)<t) 


n= 1 


If  Vij(t)  is  the  number  of  jumps  from  state  i  to  j  during  the  time  interval  [0,  t ), 
then  clearly  Hij(t )  =  E Vij(t). 

Set  Af(t)  :=  fit  +  A)  -  f{t),  A  >  0. 


Corollary  21.4.2  In  the  non- arithmetic  case , 


lim  A  Hi  jit)  = 


m  Pi  j  A 


In  the  arithmetic  case  v  must  be  a  multiple  of  the  lattice  span. 


(21.4.11) 
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Proof  Denote  by  v^\u)  the  number  of  transitions  of  the  process  §(t)  from  i  to  j 
during  the  time  interval  (0,  u)  given  the  initial  condition  (£,0).  Then,  by  the  total 
probability  formula, 


E  Avijit) 


k,  x(0  €  du)lLV}j\A  —  u). 


Since  v^\u)  <  vf)  (u),  by  Theorem  21.4.1  one  has 


hij(A)  :=  lim  EAvtjit)  =  — 

t^OO  2 J 


Y  00  n  A 

- y  Ttk  /  P(f  ^  >  u)Ev^\a  —  u)  du, 

ai7XlU  J° 


(21.4.12) 


Further, 


P(f(0  <  A  -  u)  <  Fi(A)  ^  0 


as  A 


0,  and 


P (v-j\A  —  u)  =  s)  <  ( pijFi(A))s ,  k  ±  i, 

P (vl'/iA  -u)  =  s  +  1)  <  (pijFi(A))s,  s>  1, 

P(vff(a  -  «)  =  1)  =  Pij  +  o(Fi{A)). 

It  follows  from  the  aforesaid  that 

Evjj\A  -  u)=o(Fi(A)),  E vfj}(A  -  u)  =  pij  +o(Fj(A)). 

Therefore, 


hij(A) 


m  Pi  j  a 

V,  CllTTl 


+  o(A ). 


Further,  from  the  equality 


(21.4.13) 


Hij{t  +  2  A)  -  Hi  j  (t )  =  AHijit)  +  AHij(t  +  A) 

we  obtain  that  hij  (2 A)  =  Ihij  (A),  which  means  that  hij  (A)  is  linear.  Together  with 
(21.4.13)  this  proves  (21.4.11).  The  corollary  is  proved.  □ 


The  class  of  processes  for  which  one  can  prove  ergodicity  using  the  same  meth¬ 
ods  as  the  one  used  for  semi-Markov  processes  and  also  in  Chap.  13,  can  be  some¬ 
what  extended.  For  this  broader  class  of  processes  we  will  prove  in  the  next  section 
the  ergodic  theorem,  and  also  the  laws  of  large  numbers  and  the  central  limit  theo¬ 
rem  for  integrals  of  such  processes. 
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21.5  Regenerative  Processes 

21.5.1  Regenerative  Processes.  The  Ergodic  Theorem 


Let  X(t)  and  Xo(t);  t  >  0,  be  processes  given  in  the  space  D(0,  oo)  of  functions 
without  discontinuities  of  the  second  type  (the  state  space  of  these  processes  could 
be  any  metric  space,  not  necessarily  the  real  line).  The  process  X(t)  is  said  to  be 
regenerative  if  it  possesses  the  following  properties: 

(1)  There  exists  a  state  vo  which  is  visited  by  the  process  X  with  probability  1. 
After  each  such  visit,  the  evolution  of  the  process  starts  anew  as  if  it  were  the  original 
process  X(t)  starting  at  the  state  X(0)  =  jco-  We  will  denote  this  new  process  by 
Xo(t)  where  Xo(0)  =  jco.  To  state  this  property  more  precisely,  we  introduce  the 
time  ro  of  the  first  visit  to  xo  by  X : 

ro  :=  inf [t  >  0  :  X (t)  =  vo}. 

However,  it  is  not  clear  from  this  definition  whether  ro  is  a  random  variable.  For 
definiteness,  assume  that  the  process  X  is  such  that  for  tq  one  has 


{r0  >t}  =  unn  X(tk)  -xo 

n  tkeS 


where  S  is  a  countable  set  everywhere  dense  in  [0,  t].  In  that  case  the  set  {ro  >  t] 
is  clearly  an  event  and  ro  is  a  random  variable.  The  above  stated  property  means 
that  ro  is  a  proper  random  variable:  P(ro  <  oo)  =  1,  and  that  the  distribution  of 
X  (to  +  u),  u  >0,  coincides  with  that  of  Xo(u),  u  >  0,  whatever  the  “history”  of  the 
process  X(t)  ,t<  r0. 

(2)  The  recurrence  time  r  of  the  state  vo  has  finite  expectation  Er  <  oo, 
r  :=  inf{t :  Xo(t)  =  vo}. 

The  aforesaid  means  that  the  evolution  of  the  process  is  split  into  independent 
identically  distributed  cycles  by  its  visits  to  the  state  jco.  The  visit  times  to  vo  are 
called  regeneration  times.  The  behaviour  of  the  process  inside  the  cycles  may  be 
arbitrary,  and  no  further  conditions,  including  Markovity,  are  imposed. 

We  introduce  the  so-called  “taboo  probability” 

P(t,  B)  :=  P(Xo(f)  £  B,  r  >  t ). 

We  will  assume  that,  as  a  function  of  t ,  P(t,  B)  is  measurable  and  Riemann  inte¬ 
grate. 


Theorem  21.5.1  Let  X(t)  be  a  regenerative  process  and  the  random  variable  r  be 
non-lattice.  Then,  for  any  Borel  set  B,  as  t  oo, 

1  C°° 

P(X(t)eB)-m(B)  =  —  P  (u,  B)  du. 

Er  J0 

If  r  is  a  lattice  variable  ( which  is  the  case  for  processes  X(t)  in  discrete  time),  the 
assertion  holds  true  with  the  following  obvious  changes :  t  — >  oo  along  the  lattice 
and  the  integral  is  replaced  with  a  sum. 
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Proof  Let  To  :=  0,  Tk  :=  T\  +  •  •  •  +  Tk  be  the  epoch  of  the  k-th  regeneration  of  the 
process  Xo(t),  and 

oo 

H(u)  :=  <  u) 

k= 0 


cl 


(xk  =  r  are  independent).  Then,  using  the  total  probability  formula  and  the  key 
renewal  theorem,  we  obtain,  as  t  — >►  oo, 


00  pt 

P(x0(0  e  fl)  =  V  /  P(Tk  e  dw)  PC/1  —  w,  5) 

k=o  Jo 


-f 


'OO 


=  /  dH(u)  P(t  —  u,  B)  -> —  f  P(u,  B)  du  = 

Er  Jo 


For  the  process  X  (7)  one  gets 


(X(0  g  5)  =  [  P(to£du)P(Xo(t-u)eB)^7T(B), 

Jo 


The  theorem  is  proved. 


□ 


21.5.2  The  Laws  of  Large  Numbers  and  Central  Limit  Theorem 
for  Integrals  of  Regenerative  Processes 

Consider  a  measurable  mapping  /  :  X  — >  R  of  the  state  space  X  of  a  process  X(t) 
to  the  real  line  R.  As  in  Sect.  21.4.2,  for  the  sake  of  simplicity,  we  can  assume  that 
X  =  R  and  the  trajectories  of  X{t)  lie  in  the  space  D( 0,  oo)  of  functions  without 
discontinuities  of  the  second  kind.  In  this  case  the  paths  f(X(u )),  u  >  0,  will  be 
measurable  functions,  for  which  the  integral 

5(0=  [  f{X{uj)  du 
Jo 

is  well  defined.  For  such  integrals  we  have  the  following  law  of  large  numbers.  Set 

f  :=  /  f(Xo {uf)du,  a  :=  Er. 

Jo 

Theorem  21.5.2  Let  the  conditions  of  Theorem  21.5.1  be  satisfied  and  there  exist 
ak  :=  Ef .  Then,  as  t  —>  oo, 

5(0  p  ak 
t  a 

For  conditions  of  existence  of  Ef ,  see  Theorem  21.5.4  below. 


602 


21  Markov  Processes 


Proof  The  proof  of  the  theorem  largely  repeats  that  of  the  similar  assertion  (The¬ 
orem  13.8.1)  for  sums  of  random  variables  defined  on  a  Markov  chain.  Divide  the 
domain  u  >  0  into  half-intervals 


(0,7o],  (Tk~i,Tk\,  k>l,  7o  =  to, 


where  7^  are  the  epochs  of  hitting  the  state  vo  by  the  process  X(t),  rk  =  Tk  —  Tk~\ 
for  k  >  1  are  independent  and  distributed  as  r .  Then  the  random  variables 

rTk 

C/f  =  /  f(X(u))du,  k  >  1 

JTk- 1 


are  independent,  distributed  as  f ,  and  have  finite  expectation  a^.  The  integral  S(t) 
can  be  represented  as 

v(t) 

S  (t)  =  Z0  +  Kk  +  Zt, 
k= 1 

where 

/■* 

v(0  :=  max{k  :  Tk  <  t),  zo  •=  /  f{X(u))du,  zt  :=  /  f(X(u))du. 

JO  J  Tv{t) 

Since  ro  is  a  proper  random  variable,  zo  is  a  proper  random  variable  as  well,  and 
hence  zoA  — ->  0  as  t  — >►  oo.  Further, 

H  rrd) 

Zt=  f(x0(u))du, 

Jo 

where  y(t)  =  t  —  Tv^  has  a  proper  limiting  distribution  as  t  — >  oo  (see  Chap.  10), 

so  Zt / 1  —>  0  as  t  — >  oo.  The  sum  SV(t)  =  J2l=\  £k  is  nothing  else  but  the  generalised 
renewal  process  studied  in  Chaps.  10  and  11.  By  Theorem  11.5.2,  as  t  — >►  oo, 


SV(t)  jf  ak 

t  a 

The  theorem  is  proved. 


□ 


In  order  to  prove  the  strong  law  of  large  numbers  we  need  a  somewhat  more 
restrictive  condition  than  that  in  Theorem  21.5.2.  Put 

?*:=  T\f{X0(u))\du. 

Jo 


Theorem  21.5.3  Let  the  conditions  of  Theorem  21.5.1  be  satisfied  and  E£*  <  oo. 
Then 

S(t)  a.s.  ar 

- >  — . 

t  a 
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The  proof  essentially  repeats  (as  was  the  case  for  Theorem  21.5.2)  that  of  the  law 
of  large  numbers  for  sums  of  random  variables  defined  on  a  Markov  chain  (see 
Theorem  13.8.3).  One  only  needs  to  use,  instead  of  (13.8.18),  the  relation 


sup 

Tk<u<Tk+ 1 


dv 


and  the  fact  that  E  ^  <  oo.  The  theorem  is  proved. 


□ 


Here  an  analogue  of  Theorem  13.8.2,  in  which  the  conditions  of  existence  of 
E  and  E£  are  elucidated,  is  the  following. 


Theorem  21.5.4  (Generalisation  of  Wald’s  identity)  Let  the  conditions  of  Theo¬ 
rem  21.5.1  be  met  and  there  exist 


E|/(X<oo))|  =  /|/W|„«fa), 


where  26  (oo)  is  a  random  variable  with  the  stationary  distribution  n .  Then  there 
exist 


E£*  =ErE|/(X(oo))|,  Ef  =ErE/(X(oo)). 


The  proof  of  Theorem  21.5.4  repeats,  with  obvious  changes,  that  of  The¬ 
orem  13.8.2.  □ 


Theorem  21.5.5  (The  central  limit  theorem)  Let  the  conditions  of  Theorem  21.5.1 
be  met  and  Er2  <  oo,  E£2  <  oo.  Then 

S(t )  —  rt 

i >  t  ->  oo, 

where  r  =  a^ la ,  d2  =  D(f  —  rr). 

The  proof  as  in  the  case  of  Theorems  21.5.2-21.5.4,  repeats,  up  to  evident 
changes,  that  of  Theorem  13.8.4.  □ 

Here  an  analogue  of  Theorem  13.8.5  (on  the  conditions  of  existence  of  variance 
and  on  an  identity  for  a~ld2)  looks  more  complicated  than  under  the  conditions  of 
Sect.  13.8  and  is  omitted. 


21.6  Diffusion  Processes 

Now  we  will  consider  an  important  class  of  Markov  processes  with  continuous  tra¬ 
jectories. 

Definition  21.6.1  A  homogeneous  Markov  process  §(f)  with  state  space  (R,  23) 
and  the  transition  function  P(t,x,  B)  is  said  to  be  a  diffusion  process  if,  for  some 
finite  functions  a(x)  and  b2(x)  >  0, 
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(1)  lim^o  ±f(y-x)P(A,x,dy)=a(x), 

(2)  lim^o  3  / (y  ~  x)2P(A,x,dy)  =b2(x), 

(3)  for  some  S  >  0  and  c  <  oo, 

j\y-,f*‘p(A,x,d,)<cA'^\ 


Put  A^(t)  :=  %(t  +  A)  —  Then  the  above  conditions  can  be  written  in  the 
form: 


E[a£(0|£(0  =x]  ~  a(x)A, 

E[(Am)2\m=x]~bz(x)A, 

E[|/4f(0|2+i|£(0  =x]  <  ca1+5/2  as  a  ->  o. 

The  coefficients  a(x)  and  b(x)  are  called  the  shift  and  diffusion  coefficients ,  re¬ 
spectively.  Condition  (3)  is  an  analogue  of  the  Lyapunov  condition.  It  could  be  re¬ 
placed  with  a  Lindeberg  type  condition: 

(3a)  E [(Ai=(t))2m,  \A^(t)\  >  s]  =  o(A)  for  any  s  >  0  as  A  — >►  0. 


It  follows  immediately  from  condition  (3)  and  the  Kolmogorov  theorem  that  a 
diffusion  process  §(f)  can  be  thought  of  as  a  process  with  continuous  trajectories. 
The  standard  Wiener  process  w(t)  is  a  diffusion  process,  since  in  that  case 

P(t;x,B)  =  — L=  [  e-(x-y)2/(1,)dy, 

\l2lXt  J B 

EAw(t)  =  0,  E[z\u;(/l)]2  =  A,  E[z\u;(/l)]4  =  3Z\2. 

Therefore  the  Wiener  process  has  zero  shift  and  a  constant  diffusion  coefficient. 
Clearly,  the  process  w(t)  +  at  will  have  shift  a  and  the  same  diffusion  coefficient. 

We  saw  in  Sect.  21.2  that  the  “local”  characteristic  Q  of  a  Markov  process  £(0 
with  a  discrete  state  space  X  specifies  uniquely  the  evolution  law  of  the  process. 
A  similar  situation  takes  place  for  diffusion  processes:  the  distribution  of  the  process 
is  determined  uniquely  by  the  coefficients  a(x )  and  b(x).  The  way  to  establishing 
this  fact  again  lies  via  the  Chapman-Kolmogorov  equation. 


Theorem  21.6.1  If  the  transition  probability  P(t,x,B )  of  a  diffusion  process  is 
twice  continuously  differentiable  with  respect  to  x,  then  P(t;  x,  B)  is  differentiable 
with  respect  to  t  and  satisfies  the  equation 


dP  dP  b2d2P 

—  =  a - 1- - z- 

dt  dx  2  dx2 

with  the  initial  condition 


(21.6.1) 


P(0;  x,  B)  =  IB(x). 


(21.6.2) 


Remark  21.6.1  The  conditions  of  the  theorem  on  smoothness  of  the  transition  func¬ 
tion  P  can  actually  be  proved  under  the  assumption  that  a  and  b  are  continuous, 
b  >bo  >  0,  \a\  <  c( \x\  +  1)  and  b2  <  c( \x\  +  1). 
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Proof  of  Theorem  21.6.1  For  brevity’s  sake  denote  by  Pf  P'x,  and  Pf  the  partial 
derivatives  ^ and  ,  respectively,  and  make  use  of  the  relation 


P(t\ y,  B)  -  P(t;x ,  B) 


yx  e  ( x,y ). 


P"(f;*,.B)], 

(21.6.3) 


Then  by  the  Chapman-Kolmogorov  equation 
Pit  +  A\x,B)  -  P(f,x,B)  =  J  P(A;x,dy)\P(t;y,B)~  P(t;x,B)] 


b2(x) 


=  a(x)PxA+~-^P"A+o(A)  +  R,  (21.6.4) 


where 


R  = 


f 


(y  -xj- 


[P"(P  yx,B)  —  />"(/;  x ,  B)]P{A ;  x,  </y)  =  / 

Jh 


+ 


|y— x|<e  •'ly— *l>£ 


/. 


The  first  integral,  by  virtue  of  the  continuity  of  P"  does  not  exceed 


r  u2 


m 


bz(x) 


A  +  o(A) 


where  8(s)  — >  0  as  s  — >  0;  the  second  integral  is  6>(zl)  by  condition  (3a).  Since  s  is 
arbitrary,  one  has  R  =  o(A)  and  it  follows  from  the  above  that 


P!  =  lim 

A^O 


P(t  +  A;  x,  B)  —  P(t;  x,  B)  .  b2(x) 


A 


=  a(x)P'  +  —^P'f. 


This  proves  (21.6.1).  The  theorem  is  proved. 


□ 


It  is  known  from  the  theory  of  differential  equations  that,  under  wide  assumptions 
about  the  coefficients  a  and  b  and  for  B  =  (— oo,  z),  the  Cauchy  problem  (21.6.1)- 
(21.6.2)  has  a  unique  solution  P  which  is  infinitely  many  times  differentiable  with 
respect  t  of  r  and  z.  From  this  it  follows  that  P(t\  x,  B)  has  a  density  p(t;  x,  z) 
which  is  the  fundamental  solution  of  (21.6.1). 

It  is  also  not  difficult  to  derive  from  Theorem  21.6.1  that,  along  with  P(t,  x,  B), 
the  function 


u(t,  x) 


j  g(z)P(t;x,dz)  =E[g(£w(f))] 


will  also  satisfy  Eq.  (21.6.1)  for  any  smooth  function  g  with  a  compact  support, 
^x\t)  being  the  diffusion  process  with  the  initial  value  §^(0)  =  v. 

In  the  proof  of  Theorem  21.6.1  we  considered  (see  (21.6.4))  the  time  increment 
A  preceding  the  main  time  interval.  In  this  connection  Eqs.  (21.6.1)  are  called  back¬ 
ward  Kolmogorov  equations.  Forward  equations  can  be  derived  in  a  similar  way. 
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Theorem  21.6.2  (Forward  Kolmogorov  equations)  Let  the  transition  density 
p(t;  x,y)  be  such  that  the  derivatives 

d2 


d 


[a(y)p(t;x,y)\  and  —r[b2(y)p(t;x,y)] 


dy '  1  3 y2 

exist  and  are  continuous.  Then  p(t,x,y )  satisfies  the  equation 
d  d  1  9^ 

Dp  :=-?-  +  —[a(y)p(t;x,y)]  -  ^ [b2(y)p(t;  x,  y)]  =  0.  (21.6.5) 

Proof  Let  g(y)  be  a  smooth  function  with  a  bounded  support, 

u(t,x)  :=Eg(tw(/))  =  J  g(y)p(x;  t,  y)dy. 


Then 


u{t  +  A,  x)  —  u(t,  x) 


-I 


=  /  p(t;  x,  z) 


p(A;z,y)g(y)dy-  /  p(A,z,y)g(z)dy 


/ 


dz .  (21.6.6) 


Expanding  the  difference  g(_y)  —  g(z)  into  a  series,  we  obtain  in  the  same  way  as  in 
the  proof  of  Theorem  21.4.1  that,  by  virtue  of  properties  (1)— (3),  the  expression  in 
the  brackets  is 


a(z)g'(z )  + 


b2(z ) 


g'\z) 


A  +  o(A). 


This  implies  that  there  exists  the  derivative 

du 


dt 


f 


=  /  p(t;x,z ) 


1  u(z^) 

a(z)g\z)  dz  +  -  — — g"(z)  dz . 


2  2 


Integrating  by  parts  we  get 
du 


f  d  r  n 

1  /  \-  —  [a(z)p(t;x,z)]  +  -j-y[bz(z)p(t;x,z)\ 

or,  which  is  the  same, 

J  Dp(t;  x,  z)g(z)  dz  =  0. 

Since  g  is  arbitrary,  (21.6.5)  follows.  The  theorem  is  proved. 


g(z)  dz  =  0 


□ 


As  in  the  case  of  discrete  X,  the  difference  between  the  forward  and  backward 
Kolmogorov  equations  becomes  more  graphical  for  non-homogeneous  diffusion 
processes,  when  the  transition  probabilities  P(s,  x,  t,  B)  depend  on  two  time  vari¬ 
ables,  while  a  and  b  in  conditions  (l)-(3)  are  functions  of  s  and  v.  Then  the  back¬ 
ward  Kolmogorov  equation  (for  densities)  will  relate  the  derivatives  of  the  transition 
densities  p(s,  x;  t,  y)  with  respect  to  the  first  two  variables,  while  the  forward  equa¬ 
tion  will  hold  for  the  derivatives  with  respect  to  the  last  two  variables. 
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We  return  to  homogeneous  diffusion  processes.  One  can  study  conditions  ensur¬ 
ing  the  existence  of  the  limiting  stationary  distribution  of  ^x\t)  as  t  ->  oo  which 
is  independent  of  v  using  the  same  approach  as  in  Sect.  21.2.  Theorem  21.2.3  will 
remain  valid  (one  simply  has  to  replace  io  in  it  with  xo,  in  agreement  with  the  no¬ 
tation  of  the  present  section).  The  proof  of  Theorem  21.2.3  also  remains  valid,  but 
will  need  a  somewhat  more  precise  argument  (in  the  new  situation,  on  the  event  B^v 
one  has  §(f  —  v)  e  dx o  instead  of  %(t  —  v)=  vo). 

If  the  stationary  distribution  density 

lim  p(t;  x ,  y)  =  p(y)  (21.6.7) 

t  — >  oo 

exists,  how  could  one  find  it?  Since  the  dependence  of  p(t;x,y)  of  t  and  x  van¬ 
ishes  as  t  — >  oo,  the  backward  Kolmogorov  equations  turn  into  the  identity  0  =  0  as 
t  — >  oo.  Turning  to  the  forward  equations  and  passing  in  (21.6.6)  to  the  limit  first  as 
t  — >  oo  and  then  as  A  ->  0,  we  come,  using  the  same  argument  as  in  the  proof  of 
Theorem  21.2.3,  to  the  following  conclusion. 

Corollary  21.6.1  If  (21.6.7)  and  the  conditions  of  Theorem  21.6.2  hold,  then  the 
stationary  density  p(y)  satisfies  the  equation 

-[a{y)piy)\  +  ^j\p2{y)p{y)\'  =  0 
(i which  is  obtained  from  (21.6.5)  if  we  put  =  0). 


Example  21.6.1  The  Ornstein-Uhlenbeck  process 


yx\t  )=xeat 


+  cteat  w 


1 


2  a 


where  w(u)  is  the  standard  Wiener  process,  is  a  homogeneous  diffusion  process 
with  the  transition  density 


p(t;x,  y)  = 


1 


\f2ji  o(t) 


exp 


at\ 2 


(y  —  xe  ) 
2  a2(t) 


a 


o‘ 


it)  =  —(e2at  -  1) 
2a 


(21.6.8) 


We  leave  it  to  the  reader  to  verify  that  this  process  has  coefficients  a(x)  =  ax, 
b(x)  =  o  =  const,  and  that  function  (21.6.8)  satisfies  the  forward  and  backward 
equations.  For  a  <  0,  there  exists  a  stationary  process  (the  definition  is  given  in  the 
next  chapter) 

/  £—2  at 

l:(t)=creatwl  — 


of  which  the  density  (which  does  not  depend  on  t )  is  equal  to 


1 

P(y)  =  lim  p(x‘  t,  y)  =  —— - exp 

t^°°  V27ra(oo) 


r 


2a2(oo) 


cr(o o)  =  — 


0‘ 


2  a 
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In  conclusion  of  this  section  we  will  consider  the  problem,  important  for  various 
applications,  of  finding  the  probability  that  the  trajectory  of  a  diffusion  process  will 
not  leave  a  given  strip.  For  simplicity’s  sake  we  confine  ourselves  to  considering 
this  problem  for  the  Wiener  process.  Let  c  >  0  and  d  <  0. 

Put 


U ( t ;  v,  B)  :=  P (w^x\u)  e  ( d ,  c)  for  all  u  e  [0,  t];  w^x\t)  e  B ) 

=  p( sup  w^x\u)  <  c,  inf  w^x\u)  >  d,  w^x\t)  e  b\ 

\u<t  / 

Leaving  out  the  verification  of  the  fact  that  the  function  U  is  twice  continuously 
differentiable,  we  will  only  prove  the  following  proposition. 

Theorem  21.6.3  The  function  U  satisfies  Eq.  (21.6.1)  with  the  initial  condition 

U(0;x,B)  =  Ib(x)  (21.6.9) 

and  boundary  conditions 

U(t\ c,  B)  =  U(t;  d,  B)  =  0.  (21.6.10) 


Proof  First  of  all  note  that  the  function  U(t;x,  B)  for  x  e  ( d,c )  satisfies  conditions 
(l)-(3)  imposed  on  the  transition  function  P(t;  x,  B).  Indeed,  consider,  for  instance, 
property  (1). 

We  have  to  verify  that 


(y  —  x)U (A;  x,  dy)  =  Aa(x)  +  o(A) 


(21.6.11) 


(with  a(x)  =  0  in  our  case).  But  U (t,  x,  B)  =  P(t;  x,  B)  — 

V(t;  x,  B)  =  pf  (  sup  w^x\u)  >  c  or  inf  w^x\u)  <d\ 

\Vu<t  J 


V(t\x,  B ),  where 
n  [w^x\t)  e  5}^, 


and 


/' 


(y  -  x)V (A;  x , dy) 


< 


max(c,  — d )  P(  sup  w^x\u)  >  c )  +  P(  inf  w^x\u)  <  d ) 

L  \U<A  /  \u<A  J J 


The  first  probability  in  the  brackets  is  given,  as  we  know  (see  (20.2.1)  and  Theo¬ 
rem  19.2.2),  by  the  value 

2P(w{x\A)  >c)=  2p(w(1)  > 

For  any  x  <  c  and  k  >  0,  it  is  o(Ak).  The  same  holds  for  the  second  probability. 
Therefore  (21.6.1 1)  is  proved.  In  the  same  way  one  can  verify  properties  (2)  and  (3). 
Further,  because  by  the  total  probability  formula,  for  v  e  ( d,c ), 

U(t  +  A;x,  B)  =  [  U (A;  x ,  dy)U (t\ y,  B ), 

Jd 
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using  an  expansion  of  the  form  (21.6.3)  for  the  function  U ,  we  obtain  in  the  same 
way  as  in  (21.6.4)  that 


U(t  +  A;  X,  B)  -  U(t;  x,B)  =  J  U(A ;  x,  dy)[U(t;  y,  B)  -  U(t;  x,  B)] 


dU  b2(x)  d2U 

=  a(x)  —  A  H - —  — -jA  +o(A). 

dx  2  dxz 


r\T  J 

This  implies  that  ^  exists  and  that  Eq.  (21.6.1)  holds  for  the  function  U . 

That  the  boundary  and  initial  conditions  are  met  is  obvious.  The  theorem  is 
proved.  □ 


The  reader  can  verify  that  the  function 

d  ,  , 

u (t ;  V,  y)  :=  —Ult;  x,  (-oo,  y)),  y  e  (d,  c), 
dy 

playing  the  role  of  the  fundamental  solution  to  the  boundary  problem  (21.6.9)- 
(21.6.10)  (the  function  u  satisfies  (21.6.1)  with  the  boundary  conditions  (21.6.10) 
and  the  initial  conditions  degenerating  into  the  8 -function),  is  equal  to 


u(t;  x,  y) 


[y  +  2k (c  -  d)]2 
2 1 


-£exp- 

k=0 

oo 

-Eexp- 

k= o 


\y  —  2c  —  2  k(c  —  d)]2 

[y  —  2d  —  2  k(c  —  d)]2 
2t 


This  expression  can  also  be  obtained  directly  from  probabilistic  considerations  (see, 
e.g.,  [32]). 


Chapter  22 

Processes  with  Finite  Second  Moments. 
Gaussian  Processes 


Abstract  The  chapter  is  devoted  to  the  classical  “second-order  theory”  of  time- 
homogeneous  processes  with  finite  second  moments.  Section  22.1  explores  the  re¬ 
lationships  between  the  covariance  function  properties  and  those  of  the  process  itself 
and  proves  the  ergodic  theorem  (in  quadratic  mean)  for  processes  with  covariance 
functions  vanishing  at  the  infinity.  Section  22.2  is  devoted  to  the  special  case  of 
Gaussian  processes,  while  Sect.  22.3  solves  the  best  linear  prediction  problem. 


22.1  Processes  with  Finite  Second  Moments 

Let  {§(/),  —  oo</<oo}bea  random  process  for  which  there  exist  the  moments 
a(t)  =  E §(/)  and  R(t,  u)  =  E §(/)§(w).  Since  it  is  always  possible  to  study  the  pro¬ 
cess  §(/)  —  a(t )  instead  of  §(/),  we  can  assume  without  loss  of  generality  that 
a(t)  =  0. 


Definition  22.1.1  The  function  R(t,u )  is  said  to  be  the  covariance  function  of  the 
process  §(/). 

Definition  22.1.2  A  function  R(t,u )  is  said  to  be  nonnegative  ( positive )  definite  if, 
for  any  k\  u\, . . . ,  u^\  a\, . . . ,  a^  0, 

^^aiaj R(ui,  u j)  >  0  (>  0). 

ij 


It  is  evident  that  the  covariance  function  R(t,u)  is  nonnegative  definite,  because 

2 


^ ~^ciiajR{iii,Uj )  =EI  I  >  0 

V  i,j 


Definition  22.1.3  A  process  §(/)  is  said  to  be  unpredictable  if  no  linear  combination 
of  the  variables  %(u i), . . . ,  is  zero  with  probability  1,  i.e.  if  there  exist  no 
u  i , . . . ,  Uk ;  a\ , . . . ,  ak  such  that 

p(y>f(M,)=oj  =  i. 

'  i  ' 
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If  R(t,  u)  is  the  covariance  function  of  an  unpredictable  process,  then  R(t,u ) 
is  positive  definite.  We  will  see  below  that  the  converse  assertion  is  also  true  in  a 
certain  sense. 

Unpredictability  means  that  we  cannot  represent  §  (4)  as  a  linear  combination  of 

£(*/')>  j  <  k. 

Example  22.1.1  The  process  f( t )  =  X!it=i  Hkgkit),  where  gk(t)  are  linearly  inde- 
pendent  and  ^  are  independent,  is  not  unpredictable,  because  from  §(4), . . . ,  § (4v) 
we  can  determine  the  values  for  all  other  t. 

Consider  the  Hilbert  space  L2  of  all  random  variables  77  on  (£?,#,  P)  having 
finite  second  moments,  E77  =  0,  endowed  with  the  inner  product  (77 1, 772)  =  E771 772 
corresponding  to  the  distance  ||  771  —  772 1|  =  [E(rji  —  772)2]1/2.  Convergence  in  L 2  is 
obviously  convergence  in  mean  quadratic. 

A  random  process  §  ( t )  may  be  thought  of  as  a  curve  in  L2. 

Definition  22.1.4  A  random  process  is  said  to  be  wide  sense  stationary  if  the 
function  R(t,u )  =:  R(t  —  u)  depends  on  the  difference  t  —  u  only.  The  function 
R(s)  is  called  nonnegative  (positive)  definite  if  the  function  R(t,  t  +  s)  is  of  the  re¬ 
spective  type.  For  brevity,  we  will  often  call  wide  sense  stationary  processes  simply 
stationary. 


For  the  Wiener  process,  R(t,u)  =  E w(t)w(u)  =  min(t,  u ),  so  that  w(t)  cannot 
be  stationary.  But  the  process  =  w(t  +  1)  —  w(t)  will  already  be  stationary. 

It  is  obvious  that,  for  a  stationary  process,  the  function  R(s)  is  even  and  E §2(t)  = 
7?(0)  =  const.  For  simplicity’s  sake,  put  7?(0)  =  1.  Then,  by  the  Cauchy-Bunja- 
kovsky  inequality, 


\R(S)\  =  |E mw  +  s) |  <  [E$ 2(0E?2(f  +  s)] 1/2  =  R(0)  =  1 


Theorem  22.1.1 

(2) 

(1)  A  process  §( t )  is  continuous  in  mean  quadratic  (f(t  +  A)  — >  as  A  ->  0) 

if  and  only  if  the  function  R(u)  is  continuous  at  zero. 

(2)  If  the  function  R(u)  is  continuous  at  zero ,  then  it  is  continuous  everywhere. 


Proof 

(1) 

(2) 


^  +  A)-§(0 

R(t  +  A)  -  R(t) 


The  theorem  is  proved. 


E (f{t  +  A)~  =  2K(0)  -  2R{A). 

{m,  Ha +a)~  $(t))  <  (t +a)~  f  it) 

yjl{R(0)  -  R(A)).  (22.1.1) 

□ 


A  process  $  (t)  continuous  in  mean  quadratic  will  be  stochastically  continuous, 
as  we  can  see  from  Chaps.  6  and  18.  The  continuity  in  mean  quadratic  does  not, 
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however,  imply  path-wise  continuity.  The  reader  can  verify  this  by  considering  the 
example  of  the  process 

m  =  ri(t  +  l)-ri(t)-l9 


where  rj(t)  is  the  Poisson  process  with  parameter  1.  For  that  process,  the  covariance 
function 


R(t)  = 


0 

1  -t 


for  t  >  1 , 
for  0  <  t  <  1 


is  continuous,  although  the  trajectories  of  %(t)  are  not.  If 


R(A) 


R( 0)  <cAl+s 


(22.1.2) 


for  some  s  >  0  then,  by  the  Kolmogorov  theorem  (see  Theorem  18.2.1),  §(f)  has 
a  continuous  modification.  From  this  it  follows,  in  particular,  that  if  R(t)  is  twice 
differentiable  at  the  point  t  =  0,  then  the  trajectories  of  §(f)  may  be  assumed  con¬ 
tinuous.  Indeed,  in  that  case,  since  R(t)  is  even,  one  has 

R'( 0)  =  0  and  R(A)  -  R( 0)  ~ -R"(0)A2. 

2 

As  a  whole,  the  smoother  the  covariance  function  is  at  zero,  the  smoother  the 
trajectories  of  $(t)  are. 

Assume  that  the  trajectories  of  $(t)  are  measurable  (for  example,  belong  to  the 
space  D ). 


Theorem  22.1.2  (The  simplest  ergodic  theorem)  If 

R(s)  ^  0  as  s  —>  oo, 

then 


(22.1.3) 


1  f 

Zt  '■=  —  I  %(t)dt—>0. 

1  Jo 


(2) 


Proof  Clearly, 


T  pT 


Ut\\  = 


h II 


R(t  —  u)dt  du, 


Since  R(s)  is  even, 


rT  rT  rT  rT 

J  :=  /  /  R(t  —  u)dtdu  =  2  /  /  R(t  —  u)  dt  du. 

Jo  Jo  Jo  Ju 

Making  the  orthogonal  change  of  variables  v  =  (t  —  u)/\/2,  s  =  (t  +  m)/V 2,  we 
obtain 


■T/V2  rT/fl 


J  <2 


/  /  R(v\fl)  dv  ds  <2T  I  R{v)dv , 

J  s=0  J  v=0  Jo 


ll^llz<^/  R{v)dv^  0. 

I  Jo 


The  theorem  is  proved. 


□ 
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Example  22.1.2  The  stationary  white  noise  process  §(t)  is  defined  as  a  process 
with  independent  values,  i.e.  a  process  such  that,  for  any  t\, . . . ,  tn,  the  variables 
§(G),  •  •  • ,  %(tn)  are  independent.  For  such  a  process, 


1  for  /  =  0, 
0  for  t  ^  0, 


and  thus  condition  (22.1.3)  is  met.  However,  one  cannot  apply  Theorem  22.1.2  here, 
for  the  trajectories  of  will  be  non-measurable  with  probability  1  (for  example, 
the  set  B  =  {t  :  %(t)  >  0}  is  non-measurable  with  probability  1). 


Definition  22.1.5  A  process  §(t)  is  said  to  be  strict  sense  stationary  if,  for  any 
t\, ...  ,tk,  the  distribution  of  (fi{t\  +  w),  +  w),  ...,§(*£  +  m))  is  independent 

of  u. 


It  is  obvious  that  if  §(f )  is  a  strict  sense  stationary  process  then 

E£(0£(m)  =  E %(t  -  m)£( 0)  =  R(t  -  u), 

and  §  (t)  will  be  wide  sense  stationary.  The  converse  is,  of  course,  not  true.  However, 
there  exists  a  class  of  processes  for  which  both  concepts  of  stationarity  coincide. 


22.2  Gaussian  Processes 

Definition  22.2.1  A  process  §(t)  is  said  to  be  Gaussian  if  its  finite-dimensional 
distributions  are  normal. 

We  again  assume  that  E§ (t)  =  0  and  R(t,  u)  =  E§ (t)i=  ( u ). 

The  finite-dimensional  distributions  are  completely  determined  by  the  ch.f.s  (A  = 
(A1? . . . ,  Xk),  §  =  (§(G), . . .  ,$(tk))) 

Ee^X’^  =Eei^JXj^tj')  =  e~iXRxT  ? 

where  R  =  \\R(ti,tj)\\  and  the  superscript  T  stands  for  transposition,  so  that 

XRXt  =  XjXjR{tj ,  tj). 
ij 

Thus  for  a  Gaussian  process  the  finite-dimensional  distributions  are  completely 
determined  by  the  covariance  function  R(t,u). 

We  saw  that  for  an  unpredictable  process  §(t),  the  function  R(t,u )  is  positive 
definite.  A  converse  assertion  may  be  stated  in  the  following  form. 

Theorem  22.2.1  If  the  function  R(t,u )  is  positive  definite ,  then  there  exists  an  un¬ 
predictable  Gaussian  process  with  the  covariance  function  R(t,u). 
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Proof  For  arbitrary  t\ , . . . ,  tk,  define  the  finite-dimensional  distribution  of  the  vector 
£(fi), . . . ,  via  the  density 

Pt\,...,tk  Og  »  •  •  •  >  %k)  (27r)^/2 

where  A  is  the  matrix  inverse  to  the  covariance  matrix  R  =  \\R(ti,tj)\\  (see 
Sect.  7.6)  and  \A\  is  the  determinant  of  A.  These  distributions  will  clearly 
be  consistent,  because  the  covariance  matrices  are  consistent  (the  matrix  for 
§(fi), . . . ,  §(4_i)  is  a  submatrix  of  7?).  It  remains  to  make  use  of  the  Kolmogorov 
theorem.  The  theorem  is  proved.  □ 

Example  22.2.1  Let  w(t)  be  the  standard  Wiener  process.  The  process 

w°(t)  =  w(t)  —  tw(\),  t  G  [0,  1], 

is  called  the  Brownian  bridge  (its  “ends  are  fixed”:  w°( 0)  =  w°(  1)  =  0).  The  co- 
variance  function  of  w°(t)  is  equal  to 

R(t,  u)  ~  E (w{t)  —  tw{\f){w{u)  —  uw(  1))  =  t(  1  —  u) 

for  u  >  t . 

A  Gaussian  wide  sense  stationary  process  §(t)  is  strict  sense  stationary.  This 
immediately  follows  from  the  fact  that  for  R(t,  u)  =  R(t  —  u)  the  finite-dimensional 
distributions  of  §(t)  become  invariant  with  respect  to  time  shift: 


Pt\,...,tk  (Al  >  •  ■  ■  >  xk)  —  Pti+u, 


,tk+u  OG  ?•••!>  %k) 


since  \\R(ti  +u,tj  +  m) ||  =  || R(ti,tj)\\. 

If  §(t)  is  a  Gaussian  process,  then  conditions  ensuring  the  smoothness  of  its 
trajectories  can  be  substantially  relaxed  in  comparison  with  (22.1.2). 

Let  for  simplicity’s  sake  the  Gaussian  process  §(t)  be  stationary. 


Theorem  22.2.2  If  for  h  <  1, 


R(h)  —  R( 0)  <  c [  log  —  ^  ,  a  >3,  c  <  oo, 


then  the  trajectories  of^(t)  can  be  assumed  continuous. 


Proof  We  make  use  of  Theorem  18.2.2  and  put  s(h)  =  (log^)  ^  for  1  <  ft  < 
(a  —  l)/2  (we  take  logarithms  to  the  base  2).  Then 


oo 


oo 


IL(2  ")  =  &  p  <o°’ 


n— 1 


n= 1 


and,  by  (22.1.1), 
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P(|£(f+A)-$(0|>e(*))=2 


1  -  <2> 


s(h) 


<2 


l  \  a/2\  i 

1  —  0  (  cs(h) ( log  — 


=  2 


V2(1-/?(A))/J 

‘-4.(c(log-j 


(22.2.1) 


Since  the  argument  of  0  increases  unboundedly  as  h  0,  y  =  a  —  2fi  >  1,  and 
by  (19.3.1) 

1 


l-0(x) 


e  x  ^  as  x  ->  oo, 


\f2nx 

we  see  that  the  right-hand  side  of  (22.2.1)  does  not  exceed 


l  \  P-a/2 

q(h)  :=  c\  (  log  - J  exp 


l  \  ot-2 p 


so  that 


oo 


oo 


^^2nq{2  n^j  =  c\^^n  y^2  exp{—  C2ny  +  n  In  2}  <  oo, 


n = 1 


n  —  1 


because  C2  >  0  and  y  >  1.  The  conditions  of  Theorem  18.2.2  are  met,  and  so  The¬ 
orem  22.2.2  is  proved.  □ 


22.3  Prediction  Problem 


Suppose  the  distribution  of  a  process  §(t)  is  known,  and  one  is  given  the  trajectory  of 
§( t )  on  a  set  B  C  (— oo,  t],  B  being  either  an  interval  or  a  finite  collection  of  points. 
What  could  be  said  about  the  value  §(t  +  u)l  Our  aim  will  be  to  find  a  random 
variable  f ,  which  is  =  cr(£(i;),  u  G  immeasurable  (and  called  a  prediction)  and 
such  that  E(§(t  +  w)  —  f  )2  assumes  the  smallest  possible  value.  The  answer  to  that 
problem  is  actually  known  (see  Sect.  4.8): 

S  =E(%(t  +  u)\$B). 


Let  § (t)  be  a  Gaussian  process,  B  =  [t\ , . . . ,  4},  t\  <  t2  <  •  •  •  <  tk  <  to  =  t  +  u, 
A  =  (<r2)-1  =  \\ciij\\  and  o2  =  ||E§  (tz-)§  Then  the  distribution  of 

the  vector  (§  (t\ ),...,  §  (to))  has  the  density 


f  ,  •  •  • ,  -V& 5  vo)  — 


I 

(2ff)(*+D/2  CXP' 


1 

2 


E 


XiXjdij 


and  the  conditional  distribution  of  §(to)  given  §(ti), . . . ,  §(4)  has  density  equal  to 
the  ratio 

f  (-M  5  •  •  •  5  Xfc ,  Vo) 

X- oo  f  ’  Xk,  xo)  dxo 
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The  exponential  part  of  this  ratio  has  the  form 


exp 


<200*0 


-J2xoxjai0 
7  =  1 


This  means  that  the  conditional  distribution  under  consideration  is  the  normal 
law  d2 ,  where 

1 


a 


=  -£ 


Xi 


j 


jaj  0 
<300 


d2  = 


<300 


Thus,  in  our  case  the  best  prediction  f  is  equal  to 


?  =  -£ 


3  —  1 


<300 


The  mean  quadratic  error  of  this  prediction  equals  y/l/aoo. 

We  have  obtained  a  linear  prediction .  In  the  general  case,  the  linearity  property 
is  usually  violated. 

Consider  now  the  problem  of  the  best  linear  prediction  in  the  case  of  an  arbitrary 
process  § (t)  with  finite  second  moments.  For  simplicity’s  sake  we  assume  again  that 
B  =  {t i, . . . ,  tk}. 

Denote  by  //(§)  the  subspace  of  L2  generated  by  the  random  variables  §(t), 
— 00  <  t  <  00,  and  by  //#(§)  the  subspace  of  H(£)  generated  (or  spanned  by) 
£(fi), . . . ,  Elements  of  //#(£)  have  the  form 

k 

Ea^(o). 

7=1 

The  existence  and  the  form  of  the  best  linear  prediction  in  this  case  are  estab¬ 
lished  by  the  following  assertion. 


Theorem  22.3.1  There  exists  a  unique  point  f  e  //#(§)  ( the  projection  of^(t  +  u) 
onto  Hb( §),  see  Fig.  22.1)  such  that 

§(f  +  K)-?  ±Hb($).  (22.3.1) 


Relation  (22.3.1)  is  equivalent  to 

||§(f  +  w)  —  f 


min  £(f  +  u)  —  0 

OeHstiV 


(22.3.2) 


Explicit  formulas  for  the  coefficients  aj  in  the  representation  £  = 
given  m  f/ie  proof 


Proof  Relation  (22.3.1)  is  equivalent  to  the  equations 

(§(?  +  m)-C,§(0'))=  0,  j  =  l,...,k. 
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Fig.  22.1  Illustration  to 
Theorem  22.3.1:  the  point  £ 
is  the  projection  of  f  (t  +  u ) 
onto  Hb(£) 


Substituting  here 


we  obtain 


k 

S  =  '%2aiS(tl)  e  Hb(%), 

1=1 


k 

R(t  +  u,tj)  =  y^aiR(tjJi),  j  =  1, .  ..,k, 

1= 1 


or,  in  vector  form,  Rt+U  =  a R,  where 


(22.3.3) 


a  =  (a\, . . . ,  ak), 

^ t-\-u  (R(t  +  u  ,  t\),...,R(t  +  u,tk)),  R 


R  (li >  tj  ) 


If  the  process  §(f)  is  unpredictable,  then  the  matrix  R  is  non-degenerate  and 
Eq.  (22.3.3)  has  a  unique  solution: 

a  =  Rt+uR~l  •  (22.3.4) 

If  § ( t )  is  not  unpredictable,  then  either  R~{  still  exists  and  then  (22.3.4)  holds,  or 
R  is  degenerate.  In  that  case,  one  has  to  choose  from  the  collection  § (t \ ),...,  § (4) 
only  /  <  k  linearly  independent  elements  for  which  all  the  above  remains  true  after 
replacing  k  with  /. 

The  equivalence  of  (22.3.1)  and  (22.3.2)  follows  from  the  following  considera¬ 
tions.  Let  0  be  any  other  element  of  Hb( $)•  Then 


i?  ~  *1  -L  %(t  +  u)  ~  ?, 


so  that 


^{t  +  u)  —  0 
The  theorem  is  proved. 


§( t  +  u) 


s  + 


> 


§( t  +  u) 


□ 


Remark  22.3.1  It  can  happen  (in  the  case  where  the  process  §(f)  is  not  unpre¬ 
dictable)  that  §(f  +  u)  e  //#(§).  Then  the  error  of  the  prediction  f  will  be  equal 
to  zero. 


Appendix  1 

Extension  of  a  Probability  Measure 


In  this  appendix  we  will  prove  Caratheodory’s  theorem,  which  was  used  in  Sect.  2.1. 

Let  A  be  an  algebra  of  subsets  of  Q  on  which  a  probability  measure  P,  i.e.,  a  real¬ 
valued  function  satisfying  conditions  P1-P3  of  Chap.  2,  is  given.  Let  y  denote  the 
class  of  all  subsets  of  Q .  For  any  A  e  2,  there  always  exists  a  sequence  {An}^=l 
of  disjoint  sets  from  A  such  that  U/^i  An  D  A  (it  suffices  to  take  A\  =  Q  and 
An  =  0,  n  >  2).  Denote  by  y(A)  the  class  of  all  such  sequences  and  introduce  on 
y  the  real- valued  function 


P*(A)  :=  inf 


oo 


y]P(A„);  {An}  e  y(A) 


n= 1 


This  function  (the  outer  measure  on  CP  induced  by  the  measure  P  on  A)  has  the 
following  properties: 

(1)  P*(A)  <  P*(B)  <  1  if  A  c  B. 

(2)  P*(lX=i  A„)  =  i  p (-An)  if  the  sets  An  e  A,  n  =  1, 2, . . . ,  are  disjoint. 

(3)  P*(L£li  An)  <  EZi  P*(A«)  for  any  Aj,  A2, . . .  G  0>. 

Property  (1)  is  obvious.  Property  (2)  is  established  by  the  following  argument. 
Let  {Bn}  be  any  sequence  from  y (A ) ,  where  A  =  A«.  Since  AnBm  = 

An  e  A,  one  has  P(A„)  =  Y2m=\  P (A„Bm).  Therefore, 


OO  OO  OO 

y]p(A„)=y]y]p(Anfim)  =  y]y]p(A„5m). 

n = 1  n  m  m=\  n  =  \ 

But,  for  each  N  <  oo, 


N 

y]p(A„Bm)  <p(5m). 

/? = l 
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Hence  this  equality  holds  for  N  =  oo  as  well  and,  for  any  sequence  {Bm}^=l  ey  (A), 

N 

£P(A„)  <P(Bm). 

n  —  1 

This  implies  that  P*  (A)  >  E„°°=i  P (A  n).  Because  the  converse  inequality  is  obvious, 
we  have  P*(A)  =  J^=1  P(A„). 

Proof  of  property  (3)  Consider,  for  some  e  >  0,  sequences  {Anjc}(fLl  e  y(An)  such 
that 

oo 

y2P(Ank)  <V*(An)  + 

k=  1 

The  sequence  of  sets  {Ank}^k=1  clearly  contains  (J  Aw  and  therefore 

oo 

r(UA'i)-EEp^-Er^+£- 

n  k  n= 1 

Since  e  is  arbitrary,  property  (3)  is  proved.  □ 

Introduce  now  the  binary  operation  of  symmetric  difference  0  on  arbitrary  sets 
A  and  B  from  0  by  means  of  the  equality 

A®  B  :=  AB  U  AZT 


It  is  not  hard  to  see  that 

A®5  =  5®A  =  A©5cAUS,  A  ®  A  =  0, 

A  ®  0  =  A,  (A  ®  £)  ®  C  =  A  ®  (5  ®  C). 

With  the  help  of  this  operation  and  the  function  P* ,  we  introduce  on  0  a  distance  p 
by  putting,  for  any  A,  B  e  0, 


p(A,5):=P*(A®£). 

This  construction  is  quite  similar  to  the  one  used  in  Sect.  3.4  (we  considered  there 
the  distance  d  (A ,  JB)=P(A®JS)  between  measurable  sets  A  and  B).  The  properties 
of  the  distance  p  are  the  same  as  in  (3.4.2).  We  will  need  the  following  properties: 

(1)  p(A,  B)  =  p(B,  A)  >  0,  p(A,  A)  =  0, 

(2)  p(A,B)  =  p(A,B), 

(3)  p(AB,  CD)  <  p(A,  C)  +  p(£,  D), 

(4)  p(UA.,U^)<E.P(A.,^). 


We  also  note  that 
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(5)  |P*(A)  —  P*(Z?)|  <  p(A,  B ),  and  therefore  P*(-)  is  a  uniformly  continuous 
function  with  respect  to  p. 

Properties  (1)— (3)  were  listed  in  (3.4.2);  in  the  present  context,  they  are  proved  in 
exactly  the  same  way  based  on  the  properties  of  the  measure  P* .  Property  (4)  follows 
from  property  (3)  of  the  measure  P*  and  the  relation  (we  put  here  A  =  (J  An  and 

B  =  [jBn ) 

A  ©  B  C  |^J(Aft  ©  Bn), 

because 


A©5  = 


C 


u^Wn^)  u  (nMn(uM 


|^J  AnBn  U  |^J  BnAn  —  |^J(Awi^  U  AnBn )  —  |^J(AW  ®  Bn ). 


Property  (5)  follows  from  the  fact  that 


A  C  B  U  (A  ©  B),  B  C  A  U  (A  0  B) 


(A  1.1) 


and  therefore 


P*(A)  -  P *(B)  <  P*(A  ©  B)  =  p(A ,  B ), 

P *(B)  -  P*(A)  <  P*(A  ©  B)  =  p(A,  B). 

Similarly  to  the  terminology  adopted  in  Sect.  3.4  we  call  a  set  A  e  ©  approximable 
if  there  exists  a  sequence  An  e  A  for  which  p(A,  An)  0.  The  totality  of  all  ap¬ 
proximable  sets  we  denote  by  21.  This  is  clearly  the  closure  of  A  with  respect  to  p. 


Lemma  A  1.1  21  is  a  a  -algebra. 

Proof  We  verify  that  21  satisfies  properties  At ,  A2r  and  A3  of  o -algebras  of  Chap.  2. 
Property  At:  Q  e  21  is  obvious,  for  A  e  21.  Property  A3  (A  e  21  if  A  e  21)  follows 
from  the  fact  that,  for  A  e  21,  there  exist  An  6  A  such  that,  as  n  — >  oo, 

p(A,  Aji)  >  0,  p(A,  Ayi)  —  p(A,  Ayi)  >  0. 

Finally,  consider  property  A2'.  We  show  first  that  if  An  e  A ,  then  A  =  (J  Aw  e  21. 
Indeed,  we  can  assume  without  loss  of  generality  that  the  An  are  disjoint.  Then, 
by  virtue  of  the  properties  of  the  measure  P*,  for  any  s  >  0, 

£P(A*)<P*tf2)  =  l. 

(n  \  /  oo  \  oo 

A’UAq=p*(  U  Ak)=  J2  p(A*) < £ 

k=  1  /  \  k=n-\-\  J  k=n+ 1 


for  n  large  enough. 
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Now  let  An  e  21.  We  have  to  show  that 

oo 

A  =  {J  An  eH. 

n  —  1 

Let  {Bn}  be  a  sequence  of  sets  from  A  such  that  p(An,  Bn)  <  s/ 2n.  Then  one  has 
B  =  U  Bn  e  21  and,  by  property  (4)  of  the  distance  p, 

oo 

p(A,  B)  <  Bn)  <  s. 

n= 1 


The  lemma  is  proved.  □ 

Now  we  can  prove  the  main  assertion. 

Theorem  Al.l  The  probability  P  can  be  extended  from  the  algebra  A  to  some 
probability  P  given  on  the  a -algebra  21. 


Proof  For  A  e  21,  put 

P(A)  :=  P*(A). 

It  is  evident  that  P(A)  =  P(A)  for  A  e  A,  and  P(T2)  =  1.  To  verify  that  P  is  a 
probability  we  just  have  to  prove  the  countable  additivity  of  P.  We  first  prove  the 
finite  additivity.  It  suffices  to  prove  it  for  two  sets: 

P*(A  U  B)  =  P*(A)  +  P*(fl),  (A  1.2) 


where  A,  B  e  21  and  A  n  B  =  0.  Let  An  e  A  and  Bn  e  A  be  such  that  p(A,  An)  ^  0 
and  p(B,  Bn)  — >►  0  as  n  ->  oo.  Then 


P*(A  U  B) 


P*(A^  U  Bn) 


<  p(AU  B ,  An  U  Bn)  <  p  (A,  An)  +  p(B,  Bn)  — >  0, 


P*(AW  U  Bn)  =  P (An  U  Bn)  =  P (An)  +  P (Bn)  -  p (AnBn).  (A1.3) 


Here 


P(AW)  P*(A),  ?(Bn)^V*(B), 


^he  theorem  on  the  extension  of  a  measure  to  the  minimum  o -algebra  containing  A  was  obtained 
by  C.  Caratheodory.  The  metrisation  of  normed  Boolean  algebras  A  by  the  distance  p(A,  B)  = 
P  (A  ®  B )  was  used  by  many  authors  (see,  e.g.,  the  talk  by  A.N.  Kolmogorov  at  the  6th  Polish 
Mathematical  Congress  in  1948  and  Halmos  [19]). 

It  was  L.Ya.  Savel’ev  who  suggested  the  use  of  the  continuity  properties  of  the  measure  with 
respect  to  the  distance  p(A,  B)  —  P*(A  ®  B)  in  order  to  extend  it. 
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P(AnBn)<P*(AnB)+P*(BnB ) 

<  P*(A„ A)  +  P *(BnB)  <  p(A,  An)  +  p(B,  Bn)  ->  0. 

Hence  (A1.3)  implies  (A1.2). 

We  now  prove  countable  additivity.  Let  A„  e  21  be  disjoint.  Then,  putting 

OO 

A=  U  An, 

n  —  1 

we  obtain  from  the  finite  additivity  of  P  that 

n  /  oo 

P(A)  =  ^P(A*)  +  Pl  |J  A* 

A  =  1  \  A=/?-H 


Therefore 

oo 

P(A)>£F(A*). 

k=  1 

On  the  other  hand, 


oo  oo 

P(A)  =  P*(A)  <  ^P*(At)  =  ^P(AO. 

k=l  k=l 


The  theorem  is  proved.  □ 

Theorem  A1.2  The  extension  of  the  probability  P  from  the  algebra  A  to  the  o- 
algebra  21  is  unique. 

Proof  Assume  that  there  exists  another  probability  Pi  on  21,  which  coincides  with 
P  on  A  and  is  such  that,  for  some  A  e  21, 

Pi(A)^P(A). 

Suppose  first  that  s  =  Pi  (A)  —  P(A)  >  0.  Consider  a  sequence  {Bn}  e  y(A)  such 
that 

oo 

^P(B„)-P(A)<|. 

It  =  1 


OO 

Pi  (A)  =  P(A)  +  e  >  J2  +  e/2 

It  =  1 


Then 
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which  contradicts  the  assumption  that  A  C  U^i  Bn-  Therefore 

Pi(A)<P(A),  Ag5 t. 

Since  P  is  p -continuous  at  the  point  0,  it  follows  that  Pi  is  also  p -continuous  at  the 
point  0,  and  hence  at  any  “point”  A  e  21.  Indeed,  by  virtue  of  (Al.l), 


Pi  (A) 


Pi  (B)  <Pi(A®  B)  <P(A®£)^0 


if  only  p(A,  B)  =  P(A  ®  B)  — >  0.  Hence,  for  A  e  21, 


P(A)=  lim  P (B)  = 

A 

Be$i 


lim  P1(5)  =  P1(A). 

B-^A 
BeZ t 


The  theorem  is  proved.  □ 

Let  21*  =  a  (A)  be  the  a -algebra  generated  by  A.  Since  A  c  21,  we  have  21*  e  21, 
and  the  next  statement  follows  in  an  obvious  way  from  the  above  assertions. 

Corollary  Al.l  The  probability  P  can  be  uniquely  extended  from  the  algebra  A  to 
the  a -algebra  21*  generated  by  A. 

Remark  Al.l  The  a -algebra  21  defined  above  as  the  closure  of  the  algebra  A  with 
respect  to  the  introduced  distance  p  is  in  many  cases  wider  than  the  o -algebra  21*  = 
o  (A)  generated  by  A.  This  fact  is  closely  related  to  the  concept  of  the  completion  of 
a  measure.  To  explain  the  concept,  we  assume  from  the  very  beginning  that  A  =  # 
is  a  o -algebra.  Then  the  measure  P  can  be  constructed  in  a  rather  simple  way.  To 
do  this  we  extend  the  measure  P  from  (Q,  $)  to  a  a -algebra  which  is  wider  than  $ 
and  is  constructed  as  follows.  We  will  say  that  a  subset  N  of  T2  belongs  to  the  class 
3ST  if  there  exists  an  A  =  A(N)  e  $  such  that  iVcA  and  P(A)  =  0.  It  is  not  hard  to 
see  that  the  class  of  all  sets  of  the  form  BUN,  where  and  N  eN,  also  forms 
a  a-algebra.  Denote  it  by  Putting  P(£  U  N )  :=  P(£)  we  obtain  an  extension  of 
P  to  (£2,  Bn).  Such  a  measure  is  said  to  be  complete ,  and  the  above  operation  itself 
is  called  the  completion  of  the  measure  P. 

Now  we  can  say  that  the  measure  P  constructed  in  Theorem  Al.l  is  complete, 
and  the  a -algebra  21  coincides  with  ^70 

If,  for  example,  T2  —  [0,  1]  and  A  is  the  algebra  generated  by  the  intervals,  then 
21*  =  cr  (A)  will,  as  we  already  know,  be  the  Borel  o -algebra,  and  21  will  be  the 
Lebesgue  extension  of  21*  consisting  of  all  “Lebesgue  measurable”  sets. 


Appendix  2 

Kolmogorov’s  Theorem  on  Consistent 
Distributions 


In  this  appendix  we  will  prove  the  Kolmogorov  theorem  asserting  that  consistent 
distributions  define  a  unique  probability  measure  such  that  the  consistent  distribu¬ 
tions  are  its  projections.  We  used  this  theorem  in  Sect.  5.5  and  in  some  other  places, 
where  distributions  on  infinite-dimensional  spaces  were  considered. 

Let  T  be  an  index  set  and,  for  each  t  e  T,  Rt  be  the  real  line  (—00,  00).  Let 
N  e  T  be  a  finite  subset  of  T .  Then  the  product  space 

jq  r,  = rn 

teT 

is  a  Euclidean  space  of  dimension  equal  to  the  number  n  of  elements  in  N,  spanned 
on  n  axes  of  the  space 


Rr  =  ["[»,. 

teT 

Assume  that,  for  any  finite  subset  N  c  T,  a  probability  measure  Pa  is  given  on 
(R^,  fSN),  where  93^  is  the  cr -algebra  of  Borel  subsets  of  R^.  Thereby  a  family  of 
measures  is  given  on  Mr .  The  family  is  said  to  be  consistent  if,  for  any  L  C  N  and 
any  Borel  set  B  from  ML, 


Fl(B)  =  Pn(B  xRn~l). 

The  measure  P l  is  said  to  be  the  projection  of  Pa  onto  ML.  A  set  from  Mr  that 
can  be  represented  in  the  form  B  x  M.T~N ,  where  B  e  and  A  is  a  finite  set,  is 
called  a  cylinder  set  in  Mr .  The  set  B  is  said  to  be  the  base  of  the  cylinder. 

Denote  by  the  o -algebra  of  sets  from  Mr  generated  by  all  cylinder  sets. 

Theorem  A2.1  (Kolmogorov)  If  a  consistent  family  of  probability  measures  is  given 
on  Mr,  then  there  exists  a  unique  probability  measure  P  on  (Mr,  53  T)  such  that,  for 
any  N ,  the  measure  Pa  coincides  with  the  projection  of  P  onto  R^. 
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2  Kolmogorov’s  Theorem  on  Consistent  Distributions 


Proof  The  cylinder  subsets  of  Mr  form  an  algebra.  We  show  that,  for  B  e  ,  the 
relations 


P(5  x  Mr-^)  =  P n(B)  (A2.1) 

define  a  measure  on  this  algebra.  First  of  all,  by  consistency  of  the  measures  P#, 
this  definition  of  probability  on  cylinder  sets  is  consistent  (we  mean  the  cases  when 
B  =  B\  x  Rn~l  for  B\  e  23L;  then  the  left-hand  side  of  (A2.1)  will  also  be  equal 
to  P(fli  xlr_i)).  Further,  the  thus  defined  probability  is  additive.  Indeed,  let  B\  x 

U^r-Ah  an(j 

x  Rr  be  two  disjoint  cylinder  sets.  Then,  putting  N  =  N\  U  N2, 

we  will  have 


P((Bl  x  R7’-"1)  U  (B2  x  Mr_A'2)) 

=  P({  (Bi  x  RN~N'  )  U  (B2  x  RN~Nl) }  x  RT~N) 

=  Pjv({(Bi  xKMi)u(S2  xl”2))) 

=  Piv(fii  x  RN~Nl)+PN(B2  x  RN~N2) 

=  P(fii  x  Rr_iVl)  +  P (B2  x  Rr_iV2). 

To  verify  that  P  is  countably  additive,  we  make  use  of  the  equivalence  of  prop¬ 
erties  P3  and  P3'  (see  Chap.  2).  By  this  equivalence,  it  suffices  to  show  that  if  23 „, 
n  =  1,2,...,  is  a  decreasing  sequence  of  cylinder  sets  and,  for  some  s  >  0,  we 
have  P(23)  >  s,  n  =  1,2,...,  then  23  =  is  not  empty.  Since  the  23  n  are 

enclosed  in  all  the  preceding  sets,  in  the  representation  23 n  =  Bn  x  M.T~Nn  one  has 
Nn  a  Nn+\  and  Bn+  \  D  RNn  C  Bn.  Without  loss  of  generality,  we  will  assume  that 
the  number  of  elements  in  the  set  Nn  =  {t\ , . . . ,  tn}  is  equal  to  n ,  and  denote  by  Xi 
(with  various  superscripts)  the  coordinates  in  the  space  Rti . 

Thus,  let 


P(®»)  =  Pivn  (Bn)  >s>  0. 

We  prove  that  the  intersection 

oo 

® = n 

a?=i 


is  non-empty.  For  any  Borel  set  Bn  C  ,  there  exists  a  compactum  Kn  such  that 

Kn  C  Bn ,  Pjvn  (2?^  <  2T+T  ’ 


Setting  Xn  :=  Kn  x  Mr  Nn ,  we  obtain 

P(®n  -  Xn)  =  PiV,(^  -  Kn)  < 
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Introduce  the  sets  Vn 
Because 


we  have 


CiU  ^k'  ^  easY t0  see  C  are  also  cylinders. 

n  n 

fc=l  fc=l 


/  "  \  n  £ 

p CB„  -  a>„)  <  P  f|(%  -  3C*)  <  £>(B*  -  3C*)  < 

\k= 1  /  k= 1 

P(®n)  >  P(®n)  -  ^  ^ 


It  follows  that  Dn  is  a  decreasing  sequence  of  non-empty  cylinder  sets.  Denote  by 
Xn  =  (ipij, . . . ,  Jt")  an  arbitrary  point  of  the  base 


A.  =  p|  **  x 

fc=l 


of  the  cylinder  T>n .  The  point  specifies  a  cylinder  subset  X  of  Rr .  Since  the  sets  T>n 
decrease,  we  have  (x^+r,  x^+r, . . . ,  x^+r)  e  X/z  for  any  r  >  0.  By  compactness  of 
Kn,  we  can  choose  a  subsequence  n\k  such  that  jc”u  — >  jci  as  k  — >  oo.  From  this 
subsequence,  one  can  choose  a  subsequence  /i2fc  such  that  jc^2*  — >  JC2,  and  so  on. 

Now  consider  the  diagonal  sequence  of  the  points  (or,  more  precisely,  cylinder 
sets)  Xnkk  =  (x"kk ,  xn^k , . . . ,  It  is  clear  that 

Xnkk  ^  X  =  (xi,x2,...) 


(component- wise)  as  k  — >  oo,  and  that 


(x"“ ,  , . . . ,  x'’f)  ->  (xu...,xm)eKm 

for  any  m.  This  means  that,  for  the  set  X  corresponding  to  the  point  X,  one  has 
X  :=  {y(f)  e  Rr  :  y{t\)  =  x\,  y(t2)  =  x2, . . .}  C  Km  c 


for  any  m,  and  therefore 

oo 

xc  pi  T>m. 

in — 1 

Thus  T>  is  non-empty,  and  the  countable  additivity  of  P  on  the  algebra  of  cylinder 
sets  is  proved.  Hence  P  is  a  measure,  and  it  remains  to  make  use  of  the  theorem 
on  the  extension  of  a  measure  from  an  algebra  to  the  a -algebra  generated  by  that 
algebra. 

The  theorem  is  proved.  □ 


Appendix  3 

Elements  of  Measure  Theory  and  Integration 


In  this  appendix,  the  properties  of  integrals  with  respect  to  a  measure  are  presented 
in  more  detail  than  in  Chaps.  4  and  6.  We  also  prove  the  basic  theorems  on  decom¬ 
position  of  measure  and  on  convergence  of  sequences  of  measures. 


3.1  Measure  Spaces 

Let  {£2,3)  be  a  measurable  space.  We  will  say  that  a  measure  space  {£2,  3,  fi)  is 
given  if  fi  is  a  nonnegative  countably  additive  set  function  on  3 ,  i.e.  a  function 
having  the  following  properties: 

(1)  MU;  Aj)  =  M^;)  f°r  any  countable  collection  of  disjoint  sets  Aj  e  3 

(a -additivity); 

(2)  fi(A)  >  0  for  any  A  e3\ 

(3)  fi(0)  =  0,  where  0  is  the  empty  set. 

The  value  fi(A)  is  called  the  measure  of  the  set  A.  We  will  only  consider  finite 
and  a -finite  measures.  In  the  former  case  one  assumes  that  fi(£2)  <  oo.  In  the  latter 
case  there  exists  a  partition  of  £2  into  countably  many  sets  Aj  such  that  fi(Aj)  <  oo. 

A  probability  space  is  an  example  of  a  space  with  a  finite  (unit)  measure.  The 
space  (R,  93,  /i),  where  R  is  the  real  line,  93  is  the  a  -algebra  of  Borel  sets,  and  p  is 
the  Lebesgue  measure,  is  an  example  of  a  space  with  a  a -finite  measure. 

We  can  also  consider  such  set  functions  p(A)  that  satisfy  conditions  (1)  and  (3) 
only,  but  are  not  necessarily  nonnegative.  Such  functions  are  called  signed  measures . 
Any  finite  signed  measure  (i.e.,  such  that  supA  fi(A)  <  oo  and  inf  a  p(A)  >  —  oo) 
can  be  represented  as  a  difference  of  two  nonnegative  measures  (the  Hahn  decompo¬ 
sition  theorem,  see  Sect.  3.5  of  the  present  appendix).  We  will  need  signed  measures 
in  Sect.  3.5  only.  Everywhere  else,  unless  otherwise  specified,  by  measures  we  will 
understand  set  functions  possessing  properties  (l)-(3). 

In  the  same  manner  as  when  establishing  the  simplest  properties  of  probability, 
one  easily  establishes  the  following  properties  of  measures: 
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(1)  fl(A)  <  fi(B)  if  A  C  B, 

(2)  H({jj  Aj)  <J^j  ii(Aj)  for  any  Aj, 

(3)  if  An  C  An+ 1  and  (J77  An  =  A  then  fi(An)  — >  fi(A ),  or,  which  is  the  same, 

(30  if  An  D  A„+i,  a  and  M^i)  <  oo  then  /x(A„)  ->  ^(A). 

Consider  further  measurable  functions  on  (X2,  #),  i.e.,  functions  §(<w)  having  the 
property  {&> :  § (&>)  G  5}  g  J  for  any  Borel  subset  B  of  the  real  line. 

The  notions  of  convergence  in  measure  and  convergence  almost  everywhere  are 
introduced  similarly  to  the  case  of  probability  measure. 

We  will  say  that  a  sequence  of  measurable  functions  converges  to  §  almost 

a  c 

everywhere  (a.e.):  —A  §  as  n  — >►  oo  if  %n(oo)  — >  §(cu)  for  all  except  from  a  set 
of  measure  0. 

We  will  say  that  the  § n  converge  to  §  m  measure:  § n  — %  §  if,  for  any  £  >  0,  as 
n  — >  oo, 

p({\ In  “II  >  e})  0. 

Now  we  turn  to  the  construction  of  integrals  and  the  study  of  their  properties. 
First  we  consider  finite  measures  assuming  them  without  loss  of  generality  to  be 
probability  measures.  In  that  case  we  will  write  P(A)  instead  of  fi(A).  We  will  turn 
to  integrals  with  respect  to  arbitrary  measures  in  Sect.  3.4. 


3.2  The  Integral  with  Respect  to  a  Probability  Measure 
3.2.1  The  Integrals  of  a  Simple  Function 

A  measurable  function  §  (co)  is  said  to  be  simple  if  its  range  is  finite.  The  indicator 
of  a  set  F  e  $  is  the  simple  function 


I  f(co)  = 


if  co  e  F, 
if  co  £  F. 


Clearly,  any  simple  function  §  (co)  can  be  written  in  the  form 

n 

=  YxklFkfr), 

k=l 

where  Xk,  k  =  1 , 2, . . . ,  n,  are  values  assumed  by  §,  and  Fr  =  {co  :  §  (co)  =  v^}.  The 
sets  Fk  G  $  are  disjoint,  and  [Jnk=\  F^  =  F2 .  The  integral  of  the  simple  function  §  (co) 
with  respect  to  a  measure  P  is  defined  as  the  quantity 


n 

^(oj)dP(oj)  =  y2xkP(Fk)  =  E£. 

k= 1 
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The  integral  of  the  simple  function  §  ( co )  over  a  set  A  e  $  is  defined  as 


J  §( co)Ia  (co)  dP(co). 


That  these  definitions  are  consistent  (the  partitions  into  sets  Fk  may  be  different) 
can  be  verified  in  an  obvious  way. 


3.2.2  The  Integrals  of  an  Arbitrary  Function 

Lemma  A3.2.1  Let  $(co)  >  0.  There  exists  a  sequence  i-n(co)  of  simple  functions 
such  that  %n  (co)  t  §  (co)  as  n  — >  oo  for  all  co  e  T2 . 

Proof  Partition  the  segment  [0,  n]  into  n2n  equal  intervals.  Let 

xq  =  0,  x\  =  2~n,  . ..,  xn2  n=n, 

denote  the  partition  points,  so  that  x;+ \  —  x;  =  2~n .  Put 

Ft  :=  {co:xi  <t;(co)  <x;+i},  i  =  1,2,  ...,n2'?  -  1; 

nln-\ 

Fo  :=  {0  <£(<«)  <xi}  U  {§(&>)  >«},  £„(<«):=  ^  x;IFi(<w)  <  §(co). 

i=0 

The  function  (co)  is  clearly  simple,  (co)  <  §n+ 1  (<x>)  <  §  (<x>)  for  all  co ,  and  has  the 

property  that  if  n  >  §  (<x>)  at  a  point  co  e  T2  then 

1 

0  <§(*>) -§„(*>)<— . 


The  lemma  is  proved. 


□ 


Lemma  A3.2.2  t  §  >  o  and  %  t  ?  —  0  de  sequences  of  simple  functions. 

Then 


lim 

n—>oo 


dP  = 


%  dP. 


Proof  We  verify  that,  for  any  m, 


dP  <  lim 


n- 


•oo 


rin  dP. 


The  function  is  simple.  Therefore  it  is  bounded  by  some  constant:  <  cm. 

Hence,  for  any  integer  n  and  e  >  0, 


hn  —  Cm  ’  I{|m>^n+e}  T  £• 
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This  implies  that 

E£m  <  Cm P{§m  >  T]n  +  £}  +  £  +  E Tjn. 

The  probability  on  the  right-hand  side  vanishes  as  n  ->  oo: 

P{£m  >%+£}<P{§>%+£}^0, 

because  ^  converges  almost  surely  (and  hence  in  probability)  to  §.  Therefore 
E§m  <  £  +  lim^^oo  Er/W.  Since  s  is  arbitrary, 

lim  E <  lim  Etyw . 

n—^oo  n—^oo 

Swapping  {§w}  and  {%},  we  obtain  the  converse  inequality. 

The  lemma  is  proved.  □ 


The  assertions  of  Lemmas  A3. 2.1  and  A3. 2. 2  make  the  following  definitions 
consistent. 

The  integral  of  a  nonnegative  measurable  function  §  (co)  (with  respect  to  measure 
P)  is  the  quantity 


n 


lim 

oo 


/ 


n 


dP, 


(A3. 2.1) 


where  is  a  sequence  of  simple  functions  such  that  f  §  as  n  — >  oo. 

The  integral  f  ^  dP  will  also  be  denoted  by  E§ .  We  will  say  that  the  integral 
ftdP  exists  and  §  is  integrable  if  E§  <  oo. 

The  integral  of  an  arbitrary  function  (assuming  values  of  both  signs)  §(&>)  (with 
respect  to  measure  P)  is  the  quantity 


E£=Ef+-E r,  :=  max(0,  ±|), 


which  is  defined  when  at  least  one  of  the  values  E^  is  finite.  Otherwise  E§  is 
undefined.  The  integral  E§  exists  if  and  only  if  E|§  |  <  oo  exists  (for  |§|  =  +  §_). 

If  E§  exists  then 

E ($;A):=  [  %dP  =  El=lA 
Ja 

exists  for  any  Ae^as  well. 


Lemma  A3.2.3  If  E§  exists  and  Bn  e  $  is  a  sequence  of  sets  such  that  P (Bn)  —>  0 
as  n  —>  oo,  then 


E(§;£„)^0. 

Proof  For  any  sequence  |§m  |  \  |§  |  of  simple  functions  and  Am  :=  {|§  |  <  m)  one  has 

E|£|>  lim  E|§|IAm  >  lim  E|§m \lAm  =  E|£|, 

m—^oo  m—>oo 
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since  |fm|I Am  t  If  |.  This  implies  that 

E|f  |  =  lim  E|f|IAm  =  lim  E(|f  |;  |f  |  <  m), 

m— >00  m— >oo  x  7 

and  hence,  for  any  s  >  0,  there  exists  an  m(s)  such  that 

E|f  |  —  E(|f  |;  |f  |  <  m)  <  e 
for  m  >  m(s).  Consequently,  for  such  m,  one  has 

E(|f  |;  Bn)  =E(|f  |;  {|f  |  <  m)Bn)  +  E(|f  |;  {|f  |  >  m)Bn)  <  mP(Bn)  +  s, 
and  hence 


limsupE(|f  |;  Bn )  <  s. 

n^oo 

The  lemma  is  proved.  □ 

Note  that  Lemma  6.1.2  somewhat  extends  Lemma  A3. 2. 3. 

Corollary  A3.2.1  If  E§  is  well-defined  ( the  values  ±oo  not  being  excluded)  and 
Bn  e  $  is  a  sequence  of  sets  such  that  P(Bn)  — >►  1  as  n  — >  oo,  then 

E(§;£„)^E§. 

Proof  If  E§  exists  then  the  required  assertion  follows  from  Lemma  A3. 2. 3. 

Now  let  E§  =  oo.  Then  E§_  <  oo  and  E§+  =  oo,  where  ^  =  max(0,  ±§).  It 
follows  that  E(§-;  Bn)  ->  E§~  as  n  — >  oo.  We  show  that 

E(f +;  Bn)  ->  oo.  (A3. 2. 2) 


Let  A*  :=  {f  e  |2i_l ,  2k)},  k  =  1,2,...;  pk  '■=  P(Ayt).  We  can  assume  with- 
out  loss  of  generality  that  all  pk  >  0  (if  this  is  not  the  case  we  can  consider  a 
subsequence  such  that  all  pkj  >  0).  Since  E§+  <  1  +  YlkLi  Pk,  we  have 
J2kL\  2kPk  =  oo.  For  a  given  N  >  1,  choose  n  large  enough  such  that  P (BnAk)  > 
Pk/ 2  for  all  k  <  A.  Then 


E(f+;B„)^I]2*“2«’ 


fc=l 


where  the  right-hand  side  can  be  made  arbitrarily  large  by  an  appropriate  choice 
of  N.  This  proves  (A3. 2. 2).  Since  §  =  §+  —  the  required  convergence  is  proved. 
The  case  Ef  =  —oo  can  be  dealt  with  in  the  same  way.  The  corollary  is  proved.  □ 


634 


3  Elements  of  Measure  Theory  and  Integration 


3.2.3  Properties  of  Integrals 


II .  If  sets  Aj  e  5  are  disjoint  and  (J  ■  Aj  =  £2  then 


J  $dF  =  J2jA  IdP. 


(A3. 2. 3) 


Proof  It  suffices  to  prove  this  relation  for  §(&>)  >  0.  For  simple  functions  equal¬ 
ity  (A3. 2. 3)  is  obvious,  because 


/ 


=xk) =y^y^xkP(f: =xk\Aj). 

k  j  k 


In  the  general  case,  using  definition  (A3. 2.1)  one  gets 

f  %dP  =  lim  f  dP  =  lim  V  /  § n  dP 

J  n^ooj  n—>oo  J A  .  ' 


=  V  lim  /  i;ndP  =  Y  /  §dP.  (A3.2.4) 

^  /  ?  — >OC  J  A  .  V  JA  ; 


Swapping  summation  and  passage  to  the  limit  is  justified  here,  for  by  Lemma  A3. 2. 3 


OO  -  /  00  \  /  00  \ 

E/  MP  =  eL„;  U  t  LeE;  |J  a4 

j=NJAJ  \  j=N  /  \  j=N  / 


0 


as  A  — >  oo  uniformly  in  /i . 


□ 


12. 


/ 


=kde+f 


(^  +  ri)dP=  /  §r/P  +  /  ?7<iP 


Proof  For  simple  functions  this  property  is  obvious.  Hence,  for  §  >  0  and  rj  >  0, 
this  property  follows  from  the  additivity  of  the  limit. 

In  the  general  case  we  have  (^  and  r/±  are  defined  here  as  before) 


J  (f  +  r])dP  =  J  (%+ +  r]+)  dP  -  j  (r  +  h~)dP 


-f 


=  I  %+dP-  I  § ~dP  +  I  rj^dP-  I  rj~dP  =  I  § dP+  I  rjdP. 


/ 


/ 


,+ 


J  r]~dP=  J  ^dP  +  J 


□ 


13.  If  c  is  an  arbitrary  constant,  then 


J 


c^dP  =  c  /  ^dP 


/ 
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14.  If  §  <  r],  then  f  §  dP  <  f  ij  dP. 

The  proof  of  properties  13  and  14  is  obvious.  Since 

I  £  c/P  =  E£, 

we  can  write  down  properties  11-14  in  terms  of  expectations  as  follows: 

11 .  E§  =  V  ■  E(§ ;  A  j)  if  A  j  are  disjoint  and  M ,  A  /  =  <£2 . 

12.  E(^  +  r1)  =  E^+Er1. 

13.  Ea§  =  aE§. 

14.  E§  <  E77,  if  §  <  T]. 

Note  also  the  following  properties  of  integrals  which  easily  follow  from  11-14. 

15.  |E£|<E|$|. 

16.  If  C\  <  §  <  C2,  Cl  <  E§  <  C2. 

1 7.  If  i;  >  0  E§  =  0,  f/tcn  P(§  =  0)  =  1. 

This  property  follows  from  the  Chebyshev  inequality:  P(§  >  s)  <  E^/e  =  0  for 
any  e  >  0. 

18.  If  P(§  =  rj)  =  1  and  E§  exists  then  E§  =  E77. 

Indeed, 


E77  =  lim  E(?7;  |^|  <  n)  =  lim  E(§ ;  |§|  <  n)  =  E§. 


n—>oo 


n — >00 


3.3  Further  Properties  of  Integrals 

3.3.1  Convergence  Theorems 

A  number  of  convergence  theorems  were  proved  in  Sect.  6.1.  One  of  them  was  the 
dominated  convergence  theorem  (Corollary  6.1.3): 

p 

If  § n  — >  §  as  n  — >  00  and  \^n  \  <  ij,  Eij  <  00,  then  the  expectation  E§  exists  and 
E &  ->  E$. 

Now  we  will  present  some  further  useful  assertions  concerning  convergence  of 
integrals. 

Theorem  A3.3.1  (Monotone  convergence)  If  0  <^n  f  §,  then  E§  =  lim^oo  E^n. 

Proof  In  addition  to  Corollary  6.1.3,  here  we  only  need  to  prove  that  E^n  — >  00 
if  E§  =  00.  Put  :=  min (£„,  A)  and  i-N  :=  min(§,  A).  Then  clearly  ^  t  ^  as 
n  ^  00,  and  E£^  f  E§^.  Therefore  the  value  E£^  <  E^n  can  be  made  arbitrarily 
large  by  choosing  appropriate  n  and  A.  The  theorem  is  proved.  □ 
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These  theorems  can  be  generalised  in  the  following  way.  To  make  the  extension 
of  the  convergence  theorems  to  the  case  of  integrals  with  respect  to  signed  measures 
in  Sect.  3.4  more  convenient,  we  will  now  write  E§  in  the  form  of  the  integral 
ftdP. 


Theorem  A3.3.2  (Fatou-Lebesgue)  Let  rj  and  f  be  integrable.  If  <rj  then 


lim  sup  /  d P  <  /  lim  sup  dP. 

/2— >  OG  J  J  Tl^O o 


(A3. 3.1) 


then 


lim  inf  /  § ndP>  /  liminf^JP. 

n^o o  J  J  n^o o 


(A3. 3. 2) 


a.e. 


If  t  §  %n>Z,or^n  —4  §  f  <  rj,  then 


lim  [  %ndP=  [  ^dP. 

n^ooj  J 


(A3. 3. 3) 


Proof  We  prove  for  instance  (A3. 3. 2).  Assume  without  loss  of  generality  that  f  =  0. 
In  this  case,  as  n  — >  oo, 

§  >  r\n  :=  inf  ^  t  lim  inf ^  >  0, 

k>n  k — >-  oo 


and  by  the  monotone  convergence  theorem 


n—>oo 


/ 


lim  inf  /  §„<iP>  lim  %  c/P  = 


n—>r%j 


I 


lim  inf  d  P. 

/?— >00 


Applying  (A3. 3. 2)  to  the  sequence  rj  —  we  obtain  (A3. 3.1);  (A3. 3. 3)  follows  from 
the  previous  theorems.  The  theorem  is  proved.  □ 


3.3.2  Connection  to  Integration  with  Respect  to  a  Measure  on  the 
Real  Line 

Let  g(x)  be  a  Borel  function  given  on  the  real  line  R  (if  $3  is  the  a -algebra  of  Borel 
sets  on  the  line  and  B  e  03,  then  {x  :  g(x)  e  5}  e  03).  If  §  is  a  random  variable 
then  77  :=  g(§(of))  will  clearly  also  be  a  random  variable.  As  we  saw  in  Sect.  3.2, 
a  random  variable  §  induces  the  probability  space  (R,  03,  F^)  with  measure  on 
the  line  such  that  F ^(B)  =  P(§  e  B).  Therefore  one  can  speak  about  integrals  with 
respect  to  that  measure. 

Theorem  A3.3.3  Ifrj  =  g(%(co))  and  E77  exists ,  then 

E77  =  /  r)dP=  I  g(x)¥^(dx) 

Jn  Jr 

(on  the  right-hand  side  we  used  a  somewhat  different  notation  for  J  g  d¥%). 
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Proof  Let  first  g(v)  =  lB(x)  be  the  indicator  of  a  set  B  e  *B.  Then  ij  =  g  (§(&>))  = 
I and  Frj  =  P(§  e  B).  Therefore 


J  g(x)¥^(dx)  =  J 


IB(x)F^(dx)  =  F ^(B)  =  P(§  e  B)  =  Eij. 


Using  the  properties  of  the  integral  it  is  easy  to  establish  that  the  assertion  of  the 
theorem  holds  for  simple  functions  g.  Passing  to  the  limit  extends  that  assertion  to 
bounded  functions.  Now  let  g  >  0.  If  the  function  g(§)I#(§)  =  r)(co)l{^B)(co)  is 
bounded,  then 


Therefore 


f  g(x)¥^(dx)  =  E(?7;£  e  B). 
Jb 


/  gd¥^  =E(»j;  r]  <n). 
J{g<n} 


Passing  to  the  limit  as  n  — >►  oo  we  get  the  assertion  of  the  theorem.  Considering  the 
case  when  g  takes  values  of  both  signs  does  not  create  any  difficulties.  The  theorem 
is  proved.  □ 


Introducing  the  notation 


F$(x)=  P(£  <x), 

we  can  also  consider,  along  with  the  integral  just  discussed, 

I  g(x)F^(dx),  (A3. 3.4) 

Jr 

the  Riemann-Stieltjes  integral 

J  g(x)  dF^(x),  (A3. 3. 5) 

the  definition  of  which  was  given  in  Sect.  3.6.  It  was  also  shown  there  that,  for  con¬ 
tinuous  functions  g(v),  these  integrals  coincide.  Moreover,  we  discussed  in  Sect.  3.6 
some  other  conditions  for  these  integrals  to  coincide. 

Also  recall  that  if 

/X 

ft;  (t )  dt 

-OO 

and  the  functions  g(v)  and  f^(x)  are  Riemann  integrable,  then  integrals  (A3. 3. 4) 
and  (A3. 3. 5)  coincide  with  the  Riemann  integral 

J  g(x)f^(x)dx. 
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3.3.3  Product  Measures  and  Iterated  Integrals 

Consider  a  two-dimensional  random  variable  f  =  (§,  q)  given  on  {£2,$,  P).  The 
random  variables  §  and  q  induce  a  sample  probability  space  (M2,  32,  F ^n)  with  the 
measure  F^  given  on  elements  of  the  a -algebra  03  2  of  Borel  sets  on  the  plane  (the 
a  -algebra  generated  by  rectangles)  and  such  that 

F^^Ax  B)  =  F(%  eA,qeB). 

Here  A  x  B  is  the  set  of  points  (x,  y )  for  which  x  e  A  and  y  e  B.  If  g(x,  y)  is  a 
Borel  function  ({(v,  y)  :  g(x,  y)  e  B)  e  03  2  for  each  B  e  03),  then  it  easily  follows 
from  the  above  that 


E g(£,  rj)  = 


g(x,y)F^11(dx  dy ), 


since  both  integrals  are  equal  to  JRxFo(dx)  for  6  =  g(§,  q). 
Now  let  §  and  rj  be  independent  random  variables,  i.e. 


P(§  e  A,  rj  e  B)  =  P(§  e  A)F(q  e  B) 


(A3. 3. 6) 


for  any  A,  Z?  e  03. 

Theorem  A3.3.4  (Fubini’s  theorem  on  iterated  integrals)  If  g(x,  y)  >  0  is  a  Borel 
function  and  §  and  q  are  independent ,  then 

Eg(£.  K I)  =  E[Eg(x,  ?j)U=f  ]• 

For  arbitrary  Borel  functions  g(x,  y)  the  above  equality  holds  ifFgif,  q)  exists. 


This  very  assertion  we  stated  in  Chap.  3  in  the  form 
/  dy)=f  j  g(x,  y)¥r]{dy) 

We  will  need  the  following. 


F^  (dx) 


(A3. 3. 7) 


Lemma  A3.3.1  1.  77ze  section 

Bx  :=  {y  :  (v,y)  €  B) 

of  any  set  B  e  03  2  is  measurable :  £  03. 

2.  77ie  section  gx(y)  =  g(x,y)  of  any  Borel  function  g  (03  2 -measurable)  is  a 
Borel  function. 

3.  77z£  integral 

J  g(x,yWv(dy ) 

o/a  Borel  function  g  is  a  Borel  function  of  x. 


(A3. 3. 8) 
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Proof  1.  Let  %\  be  the  class  of  all  sets  from  53  2  of  which  all  v -sections  are  measur¬ 
able.  It  is  evident  that  %\  contains  all  rectangles  B  =  B( i)  x  B( 2),  where  5( i)  g  53 
and  B(2 )  G  53.  Moreover,  %\  is  a  a -algebra.  Indeed,  consider  for  example  the  set 
B  =  U*  where  5^  g  !Ki .  The  operation  (J  on  the  sets  5(/c)  leads  to  the  same 

operation  on  their  sections,  so  that  Bx  =  |J^  Bxk^  g  53.  For  the  other  operations 
(H  and  taking  complements)  the  situation  is  similar.  Thus,  %\  is  a  a -algebra  con¬ 
taining  all  rectangles.  This  means  that  53 2  C  X\. 

2.  For  B  g  53,  one  has 

g*  *(S)  =  h  :gx(y)  eB}  =  {j  :g(x,y)  e  #} 

=  {y  :  (x,  y)  e  g”1^)}  =  e  «■ 

3.  Integral  (A3. 3. 8)  is,  as  a  function  of  x,  the  result  of  passing  to  the  limit  in 

a  sequence  of  measurable  functions,  and  hence  is  measurable  itself.  The  lemma  is 
proved.  □ 


Proof  of  Theorem  A3. 3.4  First  we  prove  (A3. 3.7)  in  the  case  where  g(v,y)  = 
I#  Cl  y),  so  that  the  theorem  turns  into  the  formula  for  consecutive  computation 
of  the  measure  of  the  set  5  g  53 2 : 


'(($,  ri)eB)  =  j  F„((x,  y)  e  B)F^dx)  =  f  Fn(Bx)F^dx).  (A3.3.9) 


We  introduce  the  set  function 


Q(B):=  J  Fv(Bx)F^dx). 


Clearly,  Q (B)  >  0  and  Q(0)  =  0.  Further,  if  B  =  |J^  B ^  and  B(k^  are  disjoint, 
then  Bx  =  |J^  Bxk^  and  Bxk^  are  also  disjoint,  and 


Q  (B)  = 


=E  f  f(Df?w-o=eq(5(,°)- 

k  J  k 


This  means  that  Q (5)  is  a  measure. 

The  measure  Q (5)  coincides  with  F frj(B)  =  P((^,  77)  g  B)  on  rectangles  B  = 
B(  i)  x  B(2).  Indeed,  for  rectangles, 


2)  for  v  g  5(i), 
0  for  v  ^  5(i), 


and 


P(g,fl)e5)=F?(fi(1))F,(fi(2)) 


=  f  Fn(B(2))F^(dx)  =  (  Fv(Bx)F^(dx)  =  Q(B). 
J  i)  J 
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This  means  that  the  measures  Q  and  F^  coincide  on  the  algebra  generated  by 
rectangles.  By  the  measure  extension  theorem  we  obtain  that  Q  =  F^. 

We  have  proved  (A3. 3. 9).  This  implies  that  Fubini’s  theorem  holds  for  simple 
functions  gN  =  c  Ax,  >  because 

N 

E g;v(£,  rj)  =  ^ ~2cjElAj(' ’n) 

7  =  1 

N 

=  Hcj 

7  =  1 

Now  if  g  >  0  is  an  arbitrary  Borel  function  then  there  exists  a  sequence  of  simple 
functions  gN  I'  g  and,  as  in  (A3. 2.1),  it  remains  to  pass  to  the  limit: 

E g($,  rf)  =  lim  Eg;v(£,  17) 

N^OO 

=  lim  f  EgN(%,  r))Fj:(dx)  =  f  EgN(^ ,  r])F^(dx). 

N^ooJ  J 

For  an  arbitrary  function  g  one  has  to  use  the  representation  g  =  g+  ~g  ,g+>0, 
g~  >  0.  The  theorem  is  proved.  □ 

Remark  A3. 1  We  see  from  the  proof  of  the  theorem  that  the  random  variables  §  and 
rj  do  not  need  to  be  scalar.  The  assertion  remains  true  in  a  more  general  form  (see 
property  5A  in  Sect.  4.8)  and,  in  particular,  for  vector- valued  §  and  77. 


J  EIaj(x,  rj)¥^(dx)  =  J  EgN(x,  r])¥^(dx)( A3. 3 AO) 


3.4  The  Integral  with  Respect  to  an  Arbitrary  Measure 

If  ji  is  a  finite  measure  on  (£?,  #),  \l(Q)  <  00,  then  the  definition  of  the  integral 
f  i;  d/JL  with  respect  to  the  measure  fi  does  not  differ  from  that  of  the  integral  with 
respect  to  a  probability  measure  (one  could  just  put  fA  §  d/i  =  fi(k2)  Ja^  dP ,  where 
P(5)  =  /i(B) / /jl(£2)  is  a  probability  distribution).  If  fi  is  cr-finite  and  /*,(£?)  =  00, 
then  the  situation  is  somewhat  more  complicated,  although  it  can  still  be  reduced  to 
the  already  used  constructions.  First  we  will  make  several  preliminary  remarks. 

Let  (£?,  P)  be  a  probability  space  and  /  =  f(co)  >  0  an  a.e.  finite  nonnegative 
measurable  function  (i.e.,  a  random  variable).  Consider  the  set  function 

fi(A)  :=  I  fdP.  (A3.4.1) 

J  A 

If  /  is  integrable  (fi(k2)  <  00)  then  fi(A)  is  a  finite  a -additive  measure  (see  prop¬ 
erty  II)  satisfying  conditions  (l)-(3)  of  Sect.  3.1  of  the  present  appendix.  In  other 
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words,  fi  is  a  finite  measure  on  (Q ,  #) .  But  if  /  is  not  integrable,  then  fi  is  a  a -finite 
measure,  which  immediately  follows  from  the  representation 


H(A)  = 


OO  r 

u 


k=\  J An{k-l<f<k} 


fdP 


(the  integrals  in  the  sum  that  are  equal  to  fA  f\k-\<f<k)  dP  are  clearly  finite  mea¬ 
sures). 

Thus,  the  integral  of  the  form  (A3. 4.1)  is  a  measure  for  any  distribution  P  and 
function  /  >  0.  It  turns  out  that  the  following  assertion,  converse  in  a  certain  sense 
to  the  above,  also  holds. 


Lemma  A3.4.1  For  any  measure  fi  on  {£2,g),  there  exists  a  distribution  P  on  that 
space  and  a  measurable  function  f  >  0  such  that  representation  (A3.4.1)  holds. 


Thus,  any  measure  can  be  represented  as  an  integral  with  respect  to  a  probability 
measure  (i.e.,  in  the  form  E (/;  A)  for  the  respective  function  /  and  distribution  P). 


Proof  Let  p  be  a  o -finite  measure  on  (£?,£),  and  sets  Bj  eg,  j  =  1,2,...,  possess 
the  properties  =  BiBj  =  0  for  i  j,  and  fi(B j)  <  oo.  Put 


OO 


P(A)  := 

k= 1 


fi(ABjc) 

2 kfi(Bk) ' 


(A3.4.2) 


Obviously,  P(£?)  =  1  and  P  is  a  measure.  Further,  if  A  C  Bk  then 

11(A)  =  2kn{Bk)¥{A). 

This  means  that  we  should  put  f(co)  :=  2k p(Bk)  for  oo  e  Bk.  Then  the  set  function 


MA):=  f  fdP=  (  flAdP 
JA  Jk2 


will  coincide  with  p(A): 


oo 

X(A)  =  YJ^k  H(Bk)V(ABk) 
k= 1 


oo  oo 

=  ^]2VC Bk)J2 

k= 1  j= 1 


2  jfi(Bj) 


J2^ABk)  =  n(A). 

k=l 


The  lemma  is  proved. 


□ 


Besides  the  required  assertion,  we  also  obtain  that  in  representation  (A3.4.1)  the 
range  of  values  of  the  function  /  can  be  assumed  without  loss  of  generality  to  be 
countable. 


642 


3  Elements  of  Measure  Theory  and  Integration 


The  function  /  for  which  equality  (A3.4.1)  holds  is  called  the  density  of  the 
measure  fi  with  respect  to  P  (or  Radon-Nikodym  derivative  of  the  measure  fi  with 
respect  to  P)  and  is  denoted  by  dfi/dF.  It  is  evident  that  alteration  of  the  function 
f  =  diL/dV  on  a  set  of  zero  P-measure  leaves  the  equality  (A3.4.1)  unchanged. 

Now  let  11  and  P  be  two  given  arbitrary  measures.  The  question  of  under  what 
conditions  these  two  measures  fi  and  P  could  be  related  by  (A3.4.1)  and  whether 
the  function  /  is  determined  uniquely  thereby  (up  to  values  on  a  set  of  zero  P- 
measure)  is  rather  important  for  probability  theory.  (We  stress  that,  in  the  preceding 
considerations,  the  measure  P  was  constructed  in  a  special  way  from  the  measure 
IL,  or  vice  versa.)  Answers  to  these  questions  are  given  by  the  Radon-Nikodym 
theorem  to  be  discussed  in  the  next  section. 

Now,  using  the  simple  assertion  of  Lemma  A3.4.1  we  have  just  proved,  we  will 
give  the  definition  of  the  integral  with  respect  to  an  arbitrary  measure  fi. 

Let  fi  be  an  arbitrary  a -finite  measure  on  {£2,$)  and  §  >  0  a  ^-measurable 
function. 

The  integral  fA^d/i  over  a  set  A  e  $  of  the  function  §  >  0  with  respect  to  the 
measure  /i  is  the  integral 


(A3.4.3) 


with  respect  to  any  distribution  P  satisfying  equality  (A3.4.1)  (for  example,  with 
respect  to  measure  (A3. 4. 2)). 

This  definition  is  consistent  because  it  does  not  depend  on  the  choice  of  P.  In¬ 
deed,  for  simple  functions  §  (§ ( co )  =  Xk  for  co  E  Fk), 


d/i 

dF 


dp= yyxkii{ABk). 

k 


If  now  §  >  0  is  an  arbitrary  function,  then  by  the  monotone  convergence  theorem 
the  integral  JA^d/i  is  equal  to 

lim  f  £(")  —  dF  =  lim  f  ^^d/JL, 

n  >  oo  J A  dF  n-^oo  J A 


where  ^77)  f  §  is  a  sequence  of  simple  functions  which  converge  monotonically  to 
§  (see  Lemma  A3.2.1).  In  both  cases,  the  result  does  not  depend  on  the  choice  of  P. 
The  integral 


§  d/i 


of  an  arbitrary  measurable  function  §  is  defined  by 


§  d/JL, 
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when  both  expressions  on  the  right-hand  side  are  finite.  (In  that  case  one  says 
that  the  integral  fA  £  d/x  exists.)  Here,  as  before,  =  max(0,  §)  >  0  and  = 
max(0,  —  §)  >  0,  so  that  §  = 

Thus  we  see  that  the  above  definition  of  the  integral  with  respect  to  an  arbitrary 
measure  is  essentially  equivalent  to  the  construction  used  in  Sect.  3.2  of  the  present 
appendix.  However,  the  definition  in  the  form  (A3. 4. 3)  saves  us  from  the  necessity 
of  repeating  what  we  have  already  done  (and  now  in  a  more  complex  setting)  and 
enables  one  to  transfer  all  the  properties  of  the  integrals  /  §  dP  to  the  general  case. 
We  will  list  the  basic  properties  preserving  the  existing  numeration. 

11 .  /  §  d/i  =  ^  ■  fA .  §  d/i  if  Aj  are  disjoint  and  |J  •  Aj  =  £2 . 

j  *1  j  j 

12.  /(§  +  if)  dfi  =  f  §  d/JL  +  /  rj  dfi. 

13.  f  a §  d/JL  =  a  f  §  dfi. 

14.  \  d/JL  <  f  r 7  d/JL  if  §  <ij. 

15.  |  ft-  dfi\  < 

16.  If  ci  <  §(&>)  <  C2  for  «gA,  then  c\fi{A)  <  fA  §  d/i,  <  C2fi(A). 

17.  If  §  >  0  and  /  §  d/i,  =  0,  then  fi(^  >  0)  =  0. 

18.  If  fi(f  ^  77)  =  0,  then  /  §  d/i,  =  jf  77 

It  is  clear  that  all  the  convergence  theorems  remain  valid  as  well. 


Theorem  A3.4.1  (The  dominated  convergence  theorem)  Let  \%n  \  <  77  and  f  r]  dfi 
exist.  If  %n  — — ^  or  ^  ^  a.e.  as  n  —>  00  /Tzcn 


Theorem  A3.4.2  (The  monotone  convergence  theorem)  If  0  <  f /7  j"  f  as  n  ^  00 
then 


Theorem  A3.4.3  (Fatou-Lebesgue)  77zc  statement  and  proof  of  this  theorem  is  ob¬ 
tained  from  those  of  Theorem  A3. 3. 2  by  replacing  P  with  /jl. 


In  conclusion  we  note  that  if  Q  =  R  =  (—00,  00),  $  =  33  is  the  a -algebra  of 
Borel  sets,  /jl  is  the  Lebesgue  measure,  and  the  function  g(v)  is  continuous,  then 

the  integral  fa  b^g(x)  dfi(x)  coincides  with  the  Riemann  integral  g(x)dx.  This 
follows  from  the  preceding  remarks  in  part  2  of  Sect.  3.3  of  this  appendix. 


3.5  The  Lebesgue  Decomposition  Theorem  and  the 
Radon-Nikodym  Theorem 

We  return  to  a  question  that  has  already  been  asked  in  the  previous  section.  Un¬ 
der  what  conditions  on  measures  fi  and  X  given  on  can  the  measure  fi  be 
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represented  as 

li(A)  =  [  fdXl 
Ja 

We  do  not  assume  here  that  X  is  a  probability  measure. 

Definition  A3.5.1  A  measure  p  is  said  to  be  absolutely  continuous  with  respect  to 
a  measure  X  (we  write  p  <  X)  if,  for  any  A  such  that  X(A)  =  0,  one  has  p(A)  =  0. 

Definition  A3.5.2  A  set  Nu  is  said  to  be  a  support1  of  measure  p  if 
p(C2-Nll)  =  0. 

Definition  A3. 5. 2  specifies  a  rather  wide  class  of  sets  which  can  be  called  the 
support  of  the  measure  fi  when  p  is  concentrated  on  a  part  of  the  space  Q .  If  Q  =  R 
is  the  real  line  (and  in  some  other  cases  as  well),  one  can  use  another  definition 
which  specifies  a  unique  set  for  each  measure.  Consider  the  collection  of  all  intervals 
(a,  b)  C  M  with  rational  endpoints  a  and  b.  This  collection  is  countable.  Remove 
from  Q  =  R  all  such  intervals  for  which  p((a,  b))  =0.  The  remaining  set  (which  is 
measurable)  is  called  the  support  of  the  measure  pi. 

Definition  A3.5.3  One  says  that  a  measure  p  is  singularwith  respect  to  X  if  there 
exists  a  support  Nx  of  the  measure  X  such  that  p(Nx)  =  0.  Or,  which  is  the  same,  if 
there  exists  a  support  N ^  of  the  measure  p  such  that  X(A^)  =  0. 

The  last  definition,  in  contrast  to  Definition  A3. 5.1,  is  symmetric,  so  one  can 
speak  about  mutually  singular  measures  p  and  X  (this  relation  is  often  written  as 
p  _L  X). 


Theorem  A3.5.1  (Radon-Nikodym)  A  necessary  and  sufficient  condition  for 
the  absolute  continuity  p  <  X  is  that  there  exists  a  function  f  unique  up  to  X- 
equivalence  ( i.e .,  up  to  values  on  a  set  of  zero  X-measure )  such  that2 


p(A)  = 


As  we  have  already  noted,  the  function  /  is  called  the  Radon-Nikodym  derivative 
dp/dX  of  the  measure  p  with  respect  to  X  (or  density  of  p  with  respect  to  X). 

Since  sufficiency  in  the  assertion  of  the  theorem  is  obvious,  we  will  obtain  the 
Radon-Nikodym  theorem  as  a  consequence  of  the  following  Lebesgue  decomposi¬ 
tion  theorem. 


!The  conventional  definition  of  support  refers  to  the  case  when  Q  is  a  topological  space.  Then  the 
support  of  p  is  the  smallest  closed  set  such  that  its  complement  is  of  //.-measure  zero. 

2This  equality  is  sometimes  adopted  as  a  definition  of  absolute  continuity. 
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Theorem  A3.5.2  (Lebesgue)  Let  fi  and  X  be  two  o  -finite  measures  given  on  (Q ,  $) . 
There  exists  a  unique  decomposition  of  the  measure  /*,  into  two  components  iia  and 
fis  such  that 

^  /L  X. 

Moreover. ;  there  exists  a  function  /,  unique  up  to  X- equivalence,  such  that 

fia(A)  =  f  fd 

JA 

It  is  obvious  that  if  fi  <  X  then  fis  =  0,  and  the  Lebesgue  theorem  then  implies 
the  Radon-Nikodym  theorem. 


Proof  Since  fi  and  X  are  a -finite,  there  exist  increasing  sequences  of  sets 
and  such  that 

n{W)<oo,  X(i2„l)<oo,  \jQf,=Q, 

rt  n 

Putting  Qn  :=  fi  we  obtain  a  sequence  of  sets  increasing  to  Q  for  which 

/i(L2n)  <  oo,  X(T2n)  <  oo. 


If  we  prove  the  decomposition  theorem  for  restrictions  of  the  measures  fi  and  X 
to  (Bn,  $n),  where  Bn  =  T2n+\  —  Qn  and  $n  is  formed  by  sets  BnA,  A  e  we  will 
thereby  prove  it  for  the  whole  Q .  It  will  suffice  to  take  fia  and  fi s  to  be  the  sums  of 
the  respective  components  for  each  of  the  restrictions.  This  remark  means  that  we 
can  consider  the  case  of  finite  measures  only. 

Thus  let  fi  and  X  be  finite  measures. 

(a)  Let  3  be  the  class  of  functions  /  >  0  such  that 

/  f  dX  <  fi(A)  for  all  Ae$  (A3.5.1) 

J  A 

(the  class  3  is  non-empty,  for  the  function  f  =  0  belongs  to  3).  Set 

a  :=  sup  /  /  dX  <  pin)  <  oo 

and  choose  a  sequence  fn  such  that,  as  n  — >  oo, 

J  fn  dX  a. 


Put  fn  :=  max(/i,  ...,/„).  Then  clearly  fn  t  /  :=  sup/,7  and  by  the  monotone 
convergence  theorem 


(A3. 5. 2) 
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We  now  show  that  /  e  T,  i.e.,  that  (A3. 5.1)  holds  for  /.  To  this  end,  it  suffices 
by  virtue  of  (A3. 5. 2)  to  notice  that  fn  e  T.  Let  Ak,  k  =  1, . . . ,  n  be  disjoint  sets  on 
which  fn=  fk-  Then  (J"=,  Ak  =  Q  and 

n 

^2ti(AAk)  = 
k= 1 

Thus,  for  the  “maximum”  element  /'  of  T,  (A3.5.1)  also  holds. 

(b)  Putting 

fia(A):=  fdX ,  its=iL-iLa  (A3. 5. 3) 

Ja 

we  prove  that  /£s  is  singular  with  respect  to  X.  We  will  need  the  following  asser¬ 
tion  about  the  decomposition  of  an  arbitrary  signed  measure  (for  the  definition,  see 
Sect.  3.3.1  of  this  appendix). 

Theorem  A3.5.3  (The  Hahn  theorem  on  decomposition  of  a  measure)  For  any 
signed  finite  measure  y ,  there  exist  disjoint  sets  D+  e  $  and  D~  e  $  such  that,  for 
any  AgJ, 

y(AD+)  >0,  y(AD~)  <0. 

Proof  We  first  prove  that  there  exists  a  set  D  e  $  on  which  y  (A)  attains  its  upper 
bound. 

Let  Bn  g  $  be  a  sequence  such  that  y(Bn )  ->  r  =  supA  y(A)  as  n  ->  oo. 
Put  B  :=  (J*  Bk  and  consider,  for  a  fixed  n,  the  decomposition  of  T  into  2n  sets 
Bn^m,  m  =  1, . . . ,  2n,  of  the  form  f~j^=i  B'k,  where  B'k  =  Bk  or  B  —  Bk,  k  <n.  For 
n  <  N,  each  Bn,m  is  a  finite  union  of  sets  Bn,m,  1  <  M  <2n.  Denote  by  Dn  the 
sum  of  all  Bn,m  for  which  y  (Bn,m)  >  0.  Then  y(Bn)  <y(Dn). 

On  the  other  hand,  for  N  >  n,  each  Bn,m  either  belongs  to  Dn  or  is  disjoint  with 
it.  Therefore 


L 


fn  dX  — 


n  r 

-S'  / 


k=l J AAk 


fkdX  < 


y(Dn)  <  y(Dn  +  Dn+ 1  +  •  •  •  +  d^). 

This  implies  that,  for  the  set  D  =  lim  Dk,  one  has  y(Bn)  <  y(D ),  r  < 
y(D).  Recalling  the  definition  of  r ,  we  obtain  that  y(D)  =  F. 

Thus  we  have  proved  the  existence  of  a  set  D  on  which  y(D)  attains  its  max¬ 
imum.  We  now  show  that,  for  any  Ag  J,  one  has  y(AD)  >0  and  y(AD)  <  0, 
where  D  =  Q  —  D.  Indeed,  assuming,  for  instance,  that  y(AD)  <  0,  we  come  to  a 
contradiction,  for  in  that  case 

y(D  -  AD)  =  y(D)  -  y(AD)  >  y(D). 

Similarly,  assuming  that  y(AD)  >  0,  we  would  get 

y(D  +  AD)  =  y(D)  +  y(AD)  >  y(D). 

It  remains  to  put  D+  :=  D,  D~  :=  D.  The  theorem  is  proved.  □ 


3.5  The  Lebesgue  Decomposition  Theorem 


647 


Corollary  A3.5.1  Any  finite  signed  measure  y  can  be  represented  as  y  =  y  +  —  y  , 
where  y±  are  finite  nonnegative  measures . 


To  prove  the  corollary,  it  suffices  to  put 


y±(A)  :=±y(AnD±), 


where  D±  are  the  sets  from  the  Hahn  decomposition  theorem. 


□ 


We  return  to  the  proof  of  the  fact  that  the  measure  /jls  in  equality  (A3. 5. 3)  is 
singular.  Let  D+  be  the  set  in  the  Hahn  decomposition  of  the  signed  measure 

1 

L?  —  /L  —  A. 
n 

Put  A  =  f]n  D~ .  Then  A  =  |Jn  D+  and,  for  all  n  and  A  gJ, 

1 

0  <  fis(AN)  <  -A(AA). 

n 


From  here,  letting  n  — >  oo,  we  obtain  fis(AN )  =  0  and  hence  fis(A)  =  / is(AN ). 
That  is,  the  set  A  is  a  support  of  the  measure  fis . 

Further,  because 

fia(A)  =  fi(A)  -  fis(AN )  <  fi(A)  -  /is(AD+), 


we  have 

f  (f  +  -ID+)  dk  =  fia(A)  +  -X(AD+)  <  fi(A )  -  v„(A£)+)  <  fi(A). 

J  A  \  n  J  n 

^  1 

This  means  that  f  4 — In+  e  T  and  hence 

n  n 

a-J(f  +  ^IAt)^  =  Q,  +  “X(D7)- 

This  implies  )  =  0  and  X(N)  =  0,  so  that  fis  is  singular  with  respect  to  X  since 
A  is  a  support  of  fis .  □ 

Uniqueness  of  the  decomposition  pc  =  [ia  +  /is  can  be  established  as  follows. 
Assume  that  /i  =  fi'a  +  p!s  is  another  decomposition.  Then  y  :=  /i'a  —  /ia  =  /is  —  /i's . 

By  singularity,  there  exist  sets  A  and  Nf  such  that  /is(N)  =  0,  A,  (A)  =  0,  /jl[s(n')  = 
0,  and  X(N')  =  0.  Clearly,  X(D)  =  0,  where  D  =  A  U  N' .  If  we  assumed  that  y  = 
fi'a  —  fia  =  /is  —  fi's  -=fz  0,  then  there  would  exist  an  A  g  $  such  that  y(A)  0. 
Therefore,  either  y(AD)  ^  0  or  y(AD)  ^  0.  However,  the  former  is  impossible, 
for  X(D)  =  0  implies  fi'a(D )  =  fia(D )  =  0.  The  latter  is  also  impossible,  since  D  = 

A  Ar  and  hence  /jls  ( D )  =  p!s  ( D )  =  0. 
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Uniqueness  of  the  function  /  (up  to  X -equivalence)  follows  from  the  observation 
that  the  equalities 

f'dk,  f 

JA 

for  all  A  imply  the  equality  f  —  f'  =  0  a.e.  Assuming,  say,  that  X(A)  >  0  for 
A  =  {w:  f  -  f'  >8}  would  yield  for  such  A  the  relation  fA(f  —  f ')  dX  >  0.  The 
theorem  is  proved.  □ 

One  of  the  most  important  applications  of  the  Radon-Nikodym  theorem  is  the 
proof  of  existence  and  uniqueness  of  conditional  expectations. 


(f-f)dX=  0 


Proof  Let  #o  be  a  a -subalgebra  of  $  and  §  a  random  variable  on  (Q ,  P)  such 
that  E§  exists.  In  Sect.  4.8  we  defined  the  conditional  expectation  E(§  |  #0)  of  the 
variable  §  given  #o  as  an  -measurable  random  variable  q  for  which 

E(q;  B)=  E(£;  B)  (A3. 5.4) 

for  any  J.  We  can  assume  without  loss  of  generality  that  §  >  0  (an  arbitrary 
function  §  can  be  represented  as  a  difference  of  two  positive  functions).  Then 
the  right-hand  side  of  (A3. 5.4)  will  be  a  measure  on  (£2,$o).  Since  E(§;  B)  =  0 
if  P(Z?)  =  0,  this  measure  will  be  absolutely  continuous  with  respect  to  P.  This 
implies,  by  the  Radon-Nikodym  theorem,  the  existence  of  a  unique  (up  to  P- 
equivalence)  measurable  function  q  on  (£?,  #0)  such  that,  for  any  B  e  3m 

E ($;£)=  [  qdP. 

Jb 

This  relation  is  clearly  equivalent  to  (A3. 5. 4).  It  establishes  the  required  existence 
and  uniqueness  of  the  conditional  expectation.  □ 


Another  consequence  of  the  assertions  proved  in  the  present  section  was  men¬ 
tioned  in  Sect.  3.6  and  is  related  to  the  Lebesgue  theorem  stating  that  any  distribu¬ 
tion  P  on  the  real  line  R  =  (— 00,  00)  (or  the  respective  distribution  function)  has  a 
unique  representation  as  a  sum  of  the  three  components  P  =  Va  +  P^  +  Pa ,  where 
the  component  P^  is  absolutely  continuous  with  respect  to  Lebesgue  measure: 

P a(A)=  [  f(x)dx ; 

JA 

Pa  is  the  discrete  component  concentrated  on  an  at  most  countable  set  of  points 
x\,  X2, . . .  such  that  P({v^})  >  0,  and  the  component  Ps  has  a  support  of  Lebesgue 
measure  zero  and  a  continuous  distribution  function.  This  is  an  immediate  conse¬ 
quence  of  the  Lebesgue  decomposition  theorem.  One  just  has  to  extract  the  dis¬ 
crete  part  from  the  singular  (with  respect  to  Lebesgue  measure  X)  component  of  P, 
first  removing  all  the  points  v  for  which  P({v})  >  1/2,  then  all  points  v  for  which 
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P(M)  >1/3,  and  so  on.  It  is  clear  that  in  this  way  we  will  get  at  most  a  countable 
set  of  xs ,  and  that  this  process  determines  uniquely  the  discrete  component  Pa. 

All  the  aforesaid  clearly  also  applies  to  distributions  in  n -dimensional  Euclidean 
spaces  W1 . 


3.6  Weak  Convergence  and  Convergence  in  Total  Variation  of 
Distributions  in  Arbitrary  Spaces 

3.6.1  Weak  Convergence 

In  Sects.  6.2  and  7.6  we  studied  weak  convergence  of  distributions  of  random  vari¬ 
ables  and  vectors,  i.e.,  weak  convergence  of  distributions  in  R*,  k  >  1.  Now  we 
want  to  introduce  the  notion  of  weak  convergence  in  more  general  spaces  X.  As  the 
definitions  given  in  Sect.  6.2  show,  we  will  need  continuous  functions  f(x )  on  X. 
This  is  possible  only  if  the  space  X  is  endowed  with  a  topology.  For  simplicity’s 
sake,  we  restrict  ourselves  to  the  case  where  the  space  X  is  endowed  with  a  met¬ 
ric  p(x,y).  Thus,  assume  we  are  given  a  measurable  space  (X,  03)  with  a  metric  p 
which  is  “consistent”  with  the  a -algebra  03,  i.e.,  all  open  (with  respect  to  the  met¬ 
ric  p)  sets  from  X  belong  to  03  (cf.  Sect.  16.1),  so  that  any  continuous  (with  respect 
to  p)  functional  will  be  03 -measurable.  This  means  that  if  a  distribution  Q  is  given 
on  (X,  03)  (i.e.,  a  probability  space  (X,  03,  Q)  is  given),  then  {x  :  f(x)  <  t]  e  03  for 
any  t,  and  the  probabilities  of  these  sets  are  defined. 

Now  let  (£?,#,  P)  be  the  basic  probability  space.  A  measurable  mapping 
£  =  £( co )  of  the  space  (*$2,$)  to  (X,  03)  is  called  an  X-valued  random  element. 
If  {£2,3)  =  (X,  03),  the  mapping  §  may  be  the  identity  mapping.  The  space  (X,  03) 
is  said  to  be  the  sample  or  state  space  of  the  random  element  § .  When  a  functional 
/  is  continuous,  /(§)  is  a  random  variable  in  (R,  03). 

Definition  A3.6.1  Let  a  distribution  P  and  a  sequence  of  distributions  Pn  be  given 
on  the  space  (X,  03).  The  sequence  Pn  is  said  to  converge  weakly  to  P:  Pn  =>  P  as 
n  — >►  oo  if,  for  any  bounded  continuous  functional  /  (/  e  C&(X)), 


(A3. 6.1) 


If  and  §  are  random  elements  having  the  distributions  P/2  and  P,  respectively, 
then  (A3. 6.1)  is  equivalent  to 


(A3. 6. 2) 


E/(§„)^E/(§). 


This,  in  turn,  for  any  continuous  functional  /  (/  e  C(X)),  is  equivalent  to 


(A3. 6. 3) 
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Indeed,  (A3. 6. 3)  means  that,  for  any  bounded  continuous  function  g  (g  e  Cp(M)), 

Eg(f(Z„))->Eg(f(§)),  (A3.6.4) 

which  is  equivalent  to  (A3. 6. 2). 

If  X  =  X(T)  is  the  space  of  real- valued  functions  x(t),  t  eT ,  given  on  a  paramet¬ 
ric  set  T,  and  a  measurable  mapping  §(&>)  of  the  basic  probability  space  (Q ,  P) 
into  (X,  53)  is  given,  then  the  random  element  §  (co)  =  will  be  a  random  pro¬ 

cess  (see  Sect.  18.1)  if  {x  :  x(t)  <  u]  e  53  for  all  t ,  u.  In  that  case  (A3.6.1)-(A3.6.4) 
will  refer  to  the  weak  convergence  of  the  distributions  of  random  processes  which 
has  already  been  studied  in  Chap.  20. 

In  the  metric  space  X,  for  any  AgX,  one  can  define  its  boundary 

3A  =  [A]-(A), 

where  [A]  is  the  closure  of  A,  (A)  being  its  interior  ((A)  =  X  —  [A],  where  A  is  the 
complement  of  A). 

Definition  A3.6.2  A  set  A  is  said  to  be  a  continuity  set  of  the  distribution  P  (or 
P-continuous  set)  if  P(3A)  =  0.  We  will  denote  the  class  of  all  P-continuous  sets 
by  Dp. 

The  following  criterion  of  weak  convergence  of  distributions  holds  true. 
Theorem  A3.6.1  The  following  four  conditions  are  equivalent’. 

(i)  P„  =►  P, 

(ii)  lim^oo  P„(A)  =  P(A)  for  all  A  e  Dp, 

(iii)  limsup^^  P„(F)  <  P (F)  for  all  closed  F  C  X, 

(iv)  liminf^^oo  Fn(G)  >  P (G)  for  all  open  G  C  X. 

Observe  that  if  Fn  =>►  P,  then  convergence  (A3.6.1)-(A3.6.3)  takes  place  for 
a  wider  class  of  functionals  than  C&(X)  (C(X)),  namely,  for  the  so-called  P- 
continuous  functionals  (or  functionals  continuous  with  P-probability  1).  We  will 
call  so  the  functionals  /  for  which  f(xn)  — >►  f(x)  as  p(xn,  x)  — >  0  not  for  all  i  gX, 
but  only  for  igA,  P(A)  =  1 .  The  class  of  P-continuous  functionals  will  be  denoted 
by  Cp(X). 

The  classes  Dp  and  Cp(X),  and  also  the  classes  of  all  closed  and  open  sets  par¬ 
ticipating  in  Theorem  A3.6.1,  are  very  wide  which  makes  verifying  the  conditions 
of  Theorem  A3. 6.1  rather  difficult  and  cumbersome.  These  classes  can  be  substan¬ 
tially  restricted  if  we  consider  not  arbitrary  but  only  relatively  compact  sequences 
Fn  (from  any  subsequence  F'n  one  can  choose  a  convergent  subsequence;  this  ap¬ 
proach  was  already  used  in  Sect.  6.3). 

Definition  A3.6.3  A  class  D  of  sets  from  53  is  said  to  determine  the  measure  P  if, 
for  a  measure  Q,  the  equalities  P(A)  =  Q(A)  for  all  A  e  DD p  imply  Q  =  P. 
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A  class  2)  determines  the  measure  P  if  2)  is  an  algebra  and  cr(2)2)p)  =  23  x 
(condition  a  (2))  =  23x  is  insufficient  (see  e.g.  [4])). 

In  a  similar  way  we  introduce  the  class  X  of  functionals  /  determining  the  distri¬ 
bution  P  of  a  random  element  §  =  §  p :  for  any  Q,  the  coincidence  of  the  distributions 
of  f(fp)  and  /(§£)  for  all  /  e  XCp(X)  implies  P  =  Q. 

Theorem  A3.6.2  A  necessary  and  sufficient  condition  for  convergence  P72  =>  P  A 
that: 

(1)  the  sequence  P/7  is  relatively  compact ;  rmd 

(2)  there  exists  a  class  of  sets  2)  C  93  x  determining  the  measure  P  and  such  that 
P77  (A)  — >  P(A)  /or  a/ry  A  e  2)2) p . 

An  alternative  to  condition  (2)  is  the  existence  of  a  class  of  functionals  3“  which 
determines  the  measure  P  and  is  such  that  P (/(§«)  <  0  =►  P(/(0  <  0  for  all 
fe?Cp(X). 

The  following  notion  of  tightness  plays  an  important  role  in  establishing  the  com¬ 
pactness  of  {Pft}. 

Definition  A3.6.4  A  family  of  distributions  {P7? }  on  (X,  23)  is  said  to  be  tight  if, 
for  any  e  >  0,  there  exists  a  compact  set  K  =  Ks  C  X  such  that  Pn(K)  >  1  —  s  for 
all  n. 

Theorem  A3.6.3  (Prokhorov)  If  { P77}  is  a  tight  family  of  distributions  then  it  is 
relatively  compact.  If  X  is  a  complete  separable  space ,  the  converse  assertion  is 
also  true. 

Since,  for  many  functional  spaces  (in  particular,  for  spaces  C (0,  T)  and  D( 0,  T )), 
there  exist  simple  explicit  criteria  for  compactness  of  sets,  one  can  now  establish 
conditions  ensuring  convergence  Pn  =>►  P  in  these  spaces.  It  is  well  known,  for  ex¬ 
ample,  that  in  the  above-mentioned  spaces  compacta  are,  roughly  speaking,  of  the 
form  {x  :  coa(x)  <  e(A)},  where  coa(x)  is  the  so-called  “modulus  of  continuity” 
(in  the  space  C  or  Z),  respectively)  of  the  element  x,  and  e(A)  >  0  is  an  arbitrary 
function  vanishing  as  A  |  0. 

The  proofs  of  Theorems  A3.6.1-A3.6.3  can  be  found,  for  example,  in  [1].  We  do 
not  present  them  here  as  they  are  somewhat  beyond  the  scope  of  this  book  and,  on 
the  other  hand,  the  theorems  themselves  are  not  used  in  the  body  of  the  text.  We  use 
only  the  special  cases  of  these  theorems  given  in  Sects.  6.2  and  6.3. 

The  invariance  principle  of  Sect.  20.1  is  a  theorem  about  weak  convergence  of 
distributions  in  the  space  C(0,  1).  In  order  to  use  Theorems  A3. 6. 2  and  A3. 6. 3  to 
prove  this  result,  one  has  to  choose  the  class  2)  to  be  the  class  of  cylinder  sets. 
Convergence  of  P/2  to  P  on  sets  from  this  class  2)  is  the  convergence  of  finite¬ 
dimensional  distributions  of  processes  sn(t)  generated  by  sums  of  random  vari¬ 
ables  (see  Sect.  20.1).  Since  the  increments  of  sn(t)  are  essentially  independent, 
the  demonstration  of  that  part  of  the  theorem  reduces  to  proving  asymptotic  normal¬ 
ity  of  these  increments,  which  follows  immediately  from  the  central  limit  theorem. 
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The  condition  of  compactness  of  the  family  of  distributions  in  C(0,  1)  requires,  ac¬ 
cording  to  Theorem  A3. 6. 3,  a  proof  that  the  modulus  of  continuity  of  the  trajectory 
sn(t )  converges  to  zero  in  probability  (for  more  details,  see  e.g.  [1]).  This  could  be 
proved  using  the  Kolmogorov  inequality  from  Corollary  11.2.1. 


3.6.2  Convergence  in  Total  Variation 


So,  to  consider  weak  convergence  of  distributions  in  spaces  (X,  03)  of  a  general 
nature,  one  has  to  introduce  a  topology  in  the  space,  which  is  not  always  convenient 
and  feasible.  There  exists  another  type  of  convergence  of  distributions  on  (X,  03) 
which  does  not  require  the  introduction  of  topologies.  This  is  convergence  in  total 
variation. 


Definition  A3.6.5  Let  y  be  a  finite  signed  measure  on  (X,  03).  The  total  variation 
of  y  (or  the  total  variation  norm  \\y  ||)  is  the  quantity 


\\y\\  =  sup 

f  f(x)dy(x) 

/:|/I<1 

J 

(A3. 6. 5) 


where  the  supremum  is  taken  over  the  class  of  all  03 -measurable  functions  fix) 
such  that  |/(;t)|  <  1  for  all  xeX. 


The  supremum  in  (A3. 6. 5)  is  clearly  attained  on  such  functions  /  for  which, 
roughly  speaking,  fix)  =  1  at  points  v  such  that  dy(x)  >  0,  and  f{x)  =  —  1  at 
points  v  for  which  dy(x)  <  0.  Therefore  (A3. 6. 5)  can  be  written  in  the  form 


(A3. 6. 6) 


An  exact  meaning  to  this  expression  can  be  given  using  the  Hahn  decomposition 
theorem  (see  Corollary  A3.5.1),  which  implies 


y||  =  y+(X)  +  y  (X). 


(A3. 6. 7) 


The  right-hand  side  of  this  equality  may  be  taken  as  a  definition  of  f  \dy(x)\. 


Lemma  A3.6.2  Ify(X )  =  0,  then  \\y\\  =  2sup5E<s  y  (B) 


Proof  From  (A3. 6. 5)  it  follows  that,  for  any  B  ( B  is  the  complement  of  B ,  y  (B)  U 

y(B)= 0), 

Ill'll  >  y(B)  +  y(B)  =2  y(B) 


Therefore  1 1  y  \  |  >  2suPb€®  \y(B)\ ■ 
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To  obtain  the  converse  inequality,  we  will  make  use  of  Corollary  A3. 5.1  of  the 
Hahn  decomposition  theorem.  As  we  have  already  noted,  according  to  that  theorem 
(for  the  definition  of  the  set  D±  see  the  Hahn  theorem), 

II  y  II  =  y+(X)  +  y~(X)  =  y+(D+)  +  y~(D+) 

=  y  (D+)  —  y(D+)  =  2y  (D+)  <  2  sup  y(B). 

BeZ 3 


The  lemma  is  proved.  □ 

Definition  A3.6.6  Let  P  be  a  distribution  and  P n,  n  =  1, 2, . . . ,  a  sequence  of  dis¬ 
tributions  given  on  (X,  03).  We  will  say  that  P77  converges  to  P  in  total  variation : 

Fn  P,  if  ||PW  —  P||  — >  0  as  n  — >  oo. 


Convergence  in  total  variation  is  a  very  strong  form  of  convergence.  If  (X,  03)  is 

TV 

a  metric  space  and  P72  — >  P,  then  Fn  =>>  P.  Indeed,  since  any  functional  /  e  C&(X) 
is  bounded:  \f(x)\  <  b ,  we  have 


dF) 


0. 


Thus  in  that  case 

(  fdVn 

even  without  assuming  the  continuity  of  /. 

TV 

The  converse  assertion  about  convergence  P72  — >  P  if  P77  =>>  P  is  not  true. 
Let,  for  example,  X  =  [0,  1],  Fn  be  the  uniform  distribution  on  the  set  of  n  +  1 
points  {0,  1  /n, ,  n/n },  and  P  =  Uo,i.  It  is  clear  that  all  Fn  are  concentrated  on 
the  countable  set  N  of  all  rational  numbers.  Therefore  Fn(N)  =  1,  P(3ST)  =  0,  and 
|| P77  —  P||  =  Fn(N)  +  P(X  \  3ST)  =  2.  At  the  same  time,  clearly  Fn  =>»  P. 

Now  let  the  distribution  P  have  a  density  p  with  respect  to  a  measure  fi  (one 
could  take,  in  particular,  fi  =  P,  in  which  case  p(x)  =  1).  Denote  by  pn  the  density 
(with  respect  to  yt)  of  the  absolutely  continuous  (with  respect  to  /i)  component  Fan 
of  the  distribution  Fn . 


TV 

Theorem  A3.6.4  A  necessary  and  sufficient  condition  for  convergence  P72  — >  P  is 
that  pn  converges  to  p  in  measure  /jl ,  i.e.,for  any  s  >  0, 

fi{x  :  \pn(x)  —  p(x)  |  >  s]  — >  0  asn^oo. 


Proof  We  have 


J  |d(P„-P)|  =  J \pn(x)  -  p(x)\n(dx)  + 


n 


where  P7)  is  the  singular  component  of  P72  with  respect  to  the  measure  fi. 


654 


3  Elements  of  Measure  Theory  and  Integration 


Let  II P„  —  P||  — >►  0.  Then 


/ 


I Pn  -  p\dp^-  0, 


and  hence 


n{x  :  \pn(x)  -  p(x)\  >  e}  <  s  1  J  \pn  -  p \ dp  0. 


i' 


Now  let  pn  — >  p.  Put 


Then 


Bs  =  {x:  p(x)  >  e},  A„>£  =  [x  :  | p„(x)  -  p(x)  \  <  e2} 


1  >  f  pdfi  >  Efi{BE),  p.(B£)<~. 
Jb>  £ 


Consider 


/ 


I  Pn  ~p\dfl  = 


/  +/- 

J  B£An£  J  BeAns 


Here  the  first  integral  on  the  right-hand  side  does  not  exceed  s.  Since 


lim  /  pd/i  — >  1, 

£^°Jbp 


we  will  have,  for  a  given  8  >  0  and  sufficiently  small  s,  the  inequality 

/  p  dp  >1—5 

J  Bo 


and,  for  n  large  enough, 


L 


pd/i  >1—25 


B£An£ 


•  L 


(A3. 6. 8) 


(A3. 6. 9) 


pn  dp,  >1  —  35, 


(A3. 6. 10) 


Bn  An,e 


It  follows  from  these  two  inequalities  that  the  second  integral  in  (A3. 6. 9)  does  not 
exceed  55,  which  proves  (A3. 6. 8).  Furthermore,  (A3. 6. 10)  implies  that  ||P^  ||  >  1  — 
35  and  \\Fsn  ||  <35.  The  theorem  is  proved.  □ 


TV 


The  theorem  implies  that  if  P71  — >  P  then  the  absolutely  continuous  with  respect 

p 

to  /X  =  P  component  P"  of  the  distribution  Pn  has  a  density  pn  (jc)  ->  1 ,  P an  (X)  — >►  1 . 


Appendix  4 

The  Helly  and  Arzela-Ascoli  Theorems 


In  this  appendix  we  will  prove  Helly’ s  theorem  and  the  Arzela-Ascoli  theorem.  The 
former  theorem  was  used  in  Sect.  6.3,  and  both  theorems  will  be  used  in  the  proof 
of  the  main  theorem  of  Appendix  9. 

Let  3  be  the  class  of  all  distribution  functions,  and  3  the  class  of  functions  G 
possessing  properties  FI  and  F2  from  Sect.  3.2  (monotonicity  and  left  continuity), 
and  the  properties  G(— oo)  >  0  and  G(oo)  <  1.  We  will  write  Gn  =>  G  as  n  — >►  oo, 
G  G  S,  if  Gn(x)  — >  G(a)  at  all  points  of  continuity  of  the  function  G. 

Theorem  A4.1  (Helly)  Any  sequence  Fn  e  T  contains  a  convergent  subsequence 
Fnn  =>  F  e  3- 

We  will  need  the  following. 

Lemma  A4.1  A  sufficient  condition  for  convergence  Fn  =>  F  e  3  is  that 

Fn(x)->  F(x),  xeD, 

as  n  —>  oo  on  some  everywhere  dense  set  D  of  the  reals. 

Proof  Let  a  be  an  arbitrary  point  of  continuity  of  F(x).  For  arbitrary  x' ,  x"  e  D 
such  that  x'  <  x  <  x'\  one  has 

Fn(x)  <  Fn(x)  <  Fn{x"). 


Consequently, 

lim  Fn  (a7)  <  lim inf  Fn(x)  <  lim  sup  Fn(x)  <  lim  Fn  (x" 

fi — > oo  v  7  n^oo  n  v ^  n^FOO  v 


From  here  and  the  conditions  of  the  lemma  we  obtain 


F(x')  <  lim  inf  Fn  (a)  <  lim  sup  Fn(x)  < 


n—>oo 


n  —>  oo 


A.A.  Borovkov,  Probability  Theory ,  Universitext, 

DOI  10.1007/978-1-4471-5201-9,  ©  Springer- Verlag  London  2013 


655 


656 


4  The  Helly  and  Arzela-Ascoli  Theorems 


Letting  xf  \  x  and  /  along  the  set  D  and  taking  into  account  that  x  is  a  point 
of  continuity  of  F,  we  get 

lim  Fn(x)  =  F(x). 

n—>oo 

The  lemma  is  proved.  □ 

Proof  of  Theorem  A4.1  Let  D  =  {xn}  be  an  arbitrary  countable  everywhere  dense 
set  of  real  numbers.  The  numerical  sequence  {Fn(x i)}  is  bounded  and  hence  con¬ 
tains  a  convergent  sequence  {F\n(x\)}.  Denote  the  limit  of  this  sequence  by  F(x  1). 
Consider  now  the  numerical  sequence  [F\n(x2)}.  It  also  contains  a  convergent  sub¬ 
sequence  {F2n  (-*2)}  with  a  limit  F(x 2).  Moreover, 

lim  F2n(xi)  =  F(xi). 

n^oo 

Continuing  this  process,  we  will  obtain,  for  any  number  k,  k  sequences 

[Fknixi)},  i  =  l,...,k, 
such  that  lim„^oo  Fkn (x, )  =  F(xt). 

Consider  the  diagonal  sequence  of  the  distribution  functions  {Fnn(x)}.  For  any 
xk  G  D,  only  k  —  1  first  elements  of  the  numerical  sequence  {Fnn(xk)}  may  not 
belong  to  the  sequence  Fkn(xk).  Therefore 

lim  Fnn(xk)  =  F(xk). 

n^oo 

It  is  clear  that  F(x)  is  a  non-decreasing  bounded  function  given  on  D.  It  can 
easily  be  extended  by  continuity  from  the  left  to  a  non-decreasing  function  on  the 
whole  real  line.  Now  we  see  that  the  sequence  {Fnn}  and  the  function  F  satisfy  the 
conditions  of  Lemma  A4. 1 .  The  theorem  is  proved.  □ 

The  conditions  of  Helly’ s  theorem  can  be  weakened.  Namely,  instead  of  5F  we 
could  consider  a  wider  class  TC  of  non-decreasing  left  continuous  (i.e.,  satisfy¬ 
ing  properties  FI  and  F3)  functions  H  majorised  by  a  fixed  function:  for  any  x, 
\H(x)\  <  N(x)  <  00,  where  A  is  a  given  function  characterising  the  class  !K.  We 
do  not  exclude  the  case  when  \H(x)\  (or  N(x))  grow  unboundedly  as  \x\  — >  00.  The 
following  generalised  version  of  Helly’s  theorem  is  true. 

Theorem  A4.2  (Generalised  Helly  theorem)  Any  sequence  Hn  e  T  contains  a  sub¬ 
sequence  Hnn  which  converges  to  a  function  H  e  TC  at  each  point  of  continuity 

of  H. 

The  Proof  repeats  the  above  proof  of  Helly’s  theorem.  □ 

To  each  function  Hn  e  TC  we  can  associate  a  measure  /in  by  putting 

b))  =  H„{b)  -  Hn(a). 
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The  generalised  Helly  theorem  will  then  mean  that,  for  any  sequence  of  measures  fin 
generated  by  functions  from  !K,  there  exists  a  subsequence  finn  converging  weakly 
on  each  finite  interval  of  which  the  endpoints  are  not  atoms  of  the  limiting  mea¬ 
sure  lLn . 

We  give  one  more  analogue  of  Helly’ s  theorem  which  refers  to  a  collection  of 
equicontinuous  functions  gn.  Recall  that  a  sequence  of  functions  {g„  }  is  said  to  be 
equicontinuous  if,  for  any  s  >  0,  there  exists  a  8  >  0  such  that  \x\  —  x2\  <  implies 
\gn(xi)  -  gn(x2) |  <  e  for  all  n. 

Theorem  A4.3  (Arzela-Ascoli)  Let  {gn}  be  a  sequence  of  uniformly  bounded  and 
equicontinuous  functions  of  a  real  variable.  Then  there  exists  a  subsequence  gnk 
converging  to  a  continuous  limit  g  uniformly  on  each  finite  interval. 


Proof  Choose  again  a  countable  everywhere  dense  subset  {xn }  of  the  real  line,  and 
a  subsequence  {gnk}  converging  at  the  points  x\ ,  x2, . . .  Denote  its  limit  at  the  point 
xj  by  g(xj).  We  have 


gnk(x)  -  gnr(x) 


<  \gnk(x )  -  gnk(Xj)\  +  \gnr(x ) 


+  \gnk(Xj)  gnr(Xj) 


gnr(Xj ) 


(A4.1) 


By  assumption,  the  last  term  on  the  right-hand  side  tends  to  0  as  n^^  oo,nr  — >  oo. 
By  virtue  of  equicontinuity,  for  any  point  v  there  exists  a  point  xj  such  that,  for 
all  n , 


gn(x)  -gniXj) 


<  £. 


(A4.2) 


In  any  given  finite  interval  I  there  exists  a  finite  collection  of  points  xj  such  that 
(A4.2)  will  hold  for  all  points  xj  e  I .  This  implies  that  the  right-hand  side  of  (A4.1) 
will  be  less  than  3  s  for  all  sufficiently  large  nr  uniformly  over  xj  el.  Thus  there 
exists  the  limit  g(v)  =  lim gnk(x),  for  which  by  (A4.2)  we  have  \g(x)  —  g(xj)  \  <  s, 
which  implies  that  g  is  continuous.  The  theorem  is  proved.  □ 


Appendix  5 

The  Proof  of  the  Berry-Esseen  Theorem 


In  this  appendix  we  prove  the  following  assertion  stated  in  Sect.  8.5. 

Theorem  A5.1  (Berry-Esseen)  Let  ^  be  independent  identically  distributed  ran¬ 
dom  variables , 


Efyfc  =  0,  Var(£*)  =  l,  /x  =  E\%k\3  <  oo, 


n 


k= 1 


Then,  for  all  n, 


An  :=  sup  P(fw  <  x)  -  <P(x) 


< 


C/jL 


x 


where  0  is  the  standard  normal  distribution  function  and  c  is  an  absolute  constant. 


Proof  We  will  make  use  of  the  composition  method.  As  in  Sect.  8.5,  we  will  bound 
An  based  on  estimates  of  smallness  of  E g(Xn)  —  Eg(rj),  rj  €=  <I>o,i,  for  smooth  g. 
To  get  a  bound  for  An  in  Sect.  8.5,  we  chose  g  to  be  a  function  constant  outside  a 
small  interval.  The  next  lemma  shows  that  such  a  choice  is  not  obligatory.  Let  G  be 
a  distribution  function  and  y  ^  G  be  independent  of  and  17.  Put 


g(z)  :=  G 


so  that 


E  g{U)  =  EG  (fAA  =  p(k  <  +ey<  x), 


Eg(r])  =  P (rj  +  ey  <  x). 


Set 


An, e  :=  sup 


EG 


A.A.  Borovkov,  Probability  Theory ,  Universitext, 

DOI  10.1007/978-1-4471-5201-9,  ©  Springer- Verlag  London  2013 


659 


660 


5  The  Proof  of  the  Berry-Esseen  Theorem 


=  sup  P(£w  +  ey  <  x)  —  P (r)  +  sy  <  x) 


X 


sup 


P (£n  <  x  -  sy)  -  P(ri  <  x  -  ey)] 


Clearly,  An,s  <  An.  Our  aim  will  be  to  obtain  a  converse  inequality  for  An. 
Lemma  A5.1  Let  v  >  0  be  such  that  G(v )  —  G(—v)  >  3/4.  Then,  for  any  e  >  0, 


An  ^  2An,g  + 


3  vs 

+s/2tx 


Proof  Assume  that  the  swpx  in  the  definition  of  An  is  attained  on  a  positive  value 
An(x)  :=  Fn(x)  —  @(x)  (the  case  of  a  negative  value  An(x)  is  similar)  and  that,  for 
a  given  8  >  0,  the  value  x§  is  such  that 

Anils')  =  Fn(.Xt 5)  ^  An  5, 

where  Fn  is  the  distribution  function  of  t,n .  When  the  argument  increases,  the  value 
of  An(xs)  varies  little  in  the  following  sense.  Let  \y\  <  v.  Then  v  —  y  >  0  and 

An  (xs  +  e(v  -  y))  =  Fn  (xs  +  s(v  -  y))  -  <P  (xs  +  s(v  -  y)) 

>  Fn(xs)  -  <P(xg)  -  [<P(xg  +e(v  -  y))  -  <P(xs)]. 


Here  the  difference  in  the  brackets  does  not  exceed  e(v  —  y)<P' ( 0)  <  2ve /  s/Tjt,  and 
hence 

2vs 


An  (xs  +  s(v  -  y))  >  An  -  S  - 


72 


JT 


Therefore 


/ 


A„,s  >  /  dG(y)An(xs+sv-sy)  = 


/  +/ 

J\ y\<v  J  |y |>u 


3  (  2vs 

>  -\An-8-—= 

4  \  *J7jx 


1  An  3  (  2vs 

-~An  =  —  -  -  5  + 

4  2  4  \  ^/2tx 


Since  8  is  arbitrary,  the  assertion  of  the  lemma  follows. 


□ 


Corollary  A5.1  For  G  =  <P(y  €=  <I>o,i)  the  value  v  =  6/5  satisfies  the  condition  of 
Lemma  A5.1,  and 


An  ^  2(An,s  T  £)■ 


(A5.1) 


At  the  next  stage  of  the  proof  we  bound  An,s,  and  it  is  at  that  stage  where  the 
composition  method  will  be  used.  Put 


Vk 

u(n)  :=ma x4  — 

k<n  jJL 


2  2 
a  :=  £  n, 
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By  letters  c  (with  or  without  indices)  we  will  denote  absolute  constants,  not  neces¬ 
sarily  the  same  ones. 

Lemma  A5.2  For  a  >  1 , 


A 


n,s 


F 


(A5.2) 


Proof  Set  Hn  :=  J2k=i  where  rjk  €=  ^o,i  are  independent  of  each  other  and  of 
Hn  and  y .  The  composition  method  is  based  on  the  following  identity  (cf.  Theo¬ 
rem  8.5.1  and  identity  (8.5.3),  ij  e  4>o,i): 

P(?n  +  sy  <  x)  —  P(rj  +  sy  <  x)  =  P (Sn  +  ay  <  x+Jn)  -P  (Hn  F  ay  <  x^/n) 

n 

—  ^  [ P ( Sm — l  T  ifln  -\~ay  < 

m= 1 

P ( 5/77 — l  T  (fln  )  +  T)m  T"  ay  <  X\fri  )]. 


Since  for  y  ^  $o,i  one  has  Hn  —  Hm  +  ay  ^  <J>0  n-m+a2'  ^ast  sum  eclual  1° 

Em=l  where 


Dm  :=  E 


0 


x  >/n  Sm — i 


-  0 


x  Sm — ]  ?7m 


m 


m 


=  E 


<z>  I  - 


m 


d 


~<P\ Tm- 


m 


dm  \=  n  —  m  F  a  , 


T  •— 


7?m 

dm 


x+Jn  Sm — ] 


d, 


m 


To  bound  Dm  we  will  adopt  the  same  approach  as  in  Lemma  8.5.1.  Because  the 
first  two  moments  of  §m  and  r]m  coincide,  expanding  0  into  a  series  yields 

\D,n  \  <  ^supE0"(rm  +0, 


where  0(x)  =  @'(x)  and  f"  =  .  Since  the  function  cp"  is  bounded 


// 


Dm  |  5 


c/x 

J3" 


(A5.3) 


We  will  also  need  another  bound  for  Dm .  To  obtain  it,  consider  the  quantity 


:=  sup  E0/r(Tm  +  t) 


<  sup|E[0r/(rm  Ft)  —  <fi"  (Vm  F  0]  |  F  sup  E0//(Vm  +  t) 


(A5.4) 
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where  Vm  is  defined  in  the  same  way  as  Tm  but  with  Sm-\  replaced  by  Hm-\ 
Integrating  by  parts  yields 

\E[(p'\Tm  +  t)  -  0//(Vm  +  0]|  =  J  0r/(w  +  t)d[P(Tm  <  u)  -  P(Vm  <  u )] 

J  <p"\u  +  t)[P(Tm  <  u)  —  P(Vm  <  m)]  du 

/ 


<  Am-\  /  0  (m)  du  =  cAm-i, 


since  |P(Tm  <  u)  —  P(Vm  <  u)\  <  Z\m_i  (the  variables  and  Vm  are  obtained 
from  Sm-\l \Jm  —  1  and  Hm-\/ \[m  —  1,  respectively,  by  one  and  the  same  linear 
transformation). 

To  bound  the  second  summand  on  the  right-hand  side  of  (A5.4),  note  that 


E  0//(Vrw+O=  /0//(w+O  - 0^ - —  )dw, 


-I 


(A5.5) 


m 


m 


where 


m  —  1 


d 


C  m  — 


m 


i  U  Cl  iyi 

so  that  4-0 ( - )  is  the  density  of  Vm  = 


m 


n  —  m  +  a2  ’ 

jntegrat jng  right-hand 

O-m 


side  of  (A5.5)  twice  by  parts,  we  obtain 


|E  f(Vm  + 1) 


1 


m 


/ 


0(W  +  f)0 


// 


u  —  a 


m 


du 


m 


C 

<  -j 


Thus, 


—  Cl  Am  —  \  + 


1 


m 


/  Am—\ 

Dm  <  c/xl  — — - 1- 


1 


V  d^  (m  —  l)3/2 


The  bounds  derived  for  Dm  do  not  depend  on  v.  Therefore,  using  the  bound  just 
obtained  for  m  >  n/ 2,  and  bound  (A5.3)  for  m  <  n/2  (the  latter  bound  implies  then 
that  \Dm  \  <  c/ji/n 3//2),  we  get 


An,s  <  c/x 


_  m<n/  2 


_3/2+  £  £ 


1 


d3 

m>n/2  m 


m>n/2 


(m  —  l)3/2 


(A5.6) 


Here  the  first  sum  does  not  exceed  ( n/2)n  3//2  =  l/(2^n)  and  the  last  sum  does 
not  exceed 


nn 

Jn/ 2-1 


ds  c 
s3/2  “  ~Jn' 
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It  remains  to  bound  the  middle  sum.  Setting  ( u(n )  :=  max^<„  (A^Vk) //x),  we  have 

A  \n- A  - 1 

'  £  V  n  '  (n  —  m  +  a2)3/2 


J3 

m>n/ 2  m 


m>n/ 2 


The  last  sum  does  not  exceed 

A  1  1  Z*00  dz  1  1  3 

(k  +  of2)3/2  —  a3  Jo  (Z  +  o'2)3/2  of3  2a  —  2a  ’ 

provided  that  a?  >  1.  Collecting  (A5.6)  and  the  above  estimates  together,  we  obtain 
the  assertion  of  the  lemma.  □ 


We  now  turn  directly  to  the  proof  of  the  theorem.  By  virtue  of  (A5.1)  and  (A5.2), 


2  r~  2a 

—  H 

/X  /jL 


<  2c  + 


2  c/jiu(n 
a 


2a 
H - • 


Put  here  a  :=  max(4 c/x,  1).  Then  (/x  >  1) 


u(«)  <  ci  + 


u(n  +  1) 
'  2 


This  implies  that  w(A)  <  2ci  for  all  n.  To  verify  this,  we  make  use  of  induction. 
Clearly,  u(  1)  =  n(l)  <  1  <  2c\.  Let  u(n  —  1)  <  2c\.  Then  v(n)  <  2ci  and  w(A)  = 
max(i;(w),  u(n  —  \))  <2c\.  The  theorem  is  proved.  □ 


Appendix  6 

The  Basic  Properties  of  Regularly  Varying 
Functions  and  Subexponential  Distributions 


The  properties  of  regularly  varying  functions  and  subexponential  distributions  were 
used  in  Sects.  8.8,  9. 4-9. 6  and  12.7  and  will  be  used  in  Appendices  7  and  8. 


6.1  General  Properties  of  Regularly  Varying  Functions 

Definition  A6.1.1  A  positive  measurable  function  L(t)  is  called  a  slowly  varying 
function  (s.v.f.)  as  t  ->  oo  if,  for  any  fixed  v  >  0, 


L(vt ) 
~Lf) 


1  as  t 


00. 


(A6.1.1) 


A  function  V  ( t )  is  called  a  regularly  varying  function  (r.v.f.)  (with  exponent 
— p  e  R)  as  t  — >  oo  if  it  can  be  represented  as 


V(t)  =  rPL(t), 


(A6.1.2) 


where  L(t)  is  an  s.v.f.  as  t  — >  oo.  We  will  denote  the  class  of  all  r.v.f.s  by  93. 


The  definitions  of  an  s.v.f.  and  r.v.f.  as  t  |  0  are  quite  similar.  In  what  follows, 
the  term  s.v.f.  (r.v.f.)  will  (unless  specified  otherwise)  always  refer  to  a  slowly  (reg¬ 
ularly)  varying  function  at  infinity. 

It  is  easy  to  see  that,  similarly  to  (A6.1.1),  a  characteristic  property  of  regularly 
varying  functions  is  the  convergence,  for  any  fixed  v  >  0, 


V(vt) 


v 


as  t 


00. 


(A6.1.3) 


Thus,  an  s.v.f.  is  an  r.v.f.  with  exponent  zero. 

Typical  representatives  of  the  class  of  s.v.f. s  are  the  logarithmic  function  and  its 
powers  \ny  t ,  y  e  R,  their  linear  combinations,  multiple  logarithms,  functions  with 
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the  property  L(t)  — >►  L  =  const  7^  0  as  t  — >  00,  etc.  As  an  example  of  a  bounded 
oscillating  s.v.f.  one  can  give 

Lo(0  =  2  +  sin  (In  Inf),  £  >  1. 

We  will  need  the  following  two  basic  properties  of  s.v.f.s. 

Theorem  A6.1.1  (Uniform  convergence  theorem)  If  L(t)  is  an  s.v.f  as  t  ^  00 
then  convergence  (A6.1.1)  holds  uniformly  in  v  on  any  segment  [v\,  1^],  0  <  v\  < 
V2  <  00. 

The  theorem  implies  that  the  uniform  convergence  (A6.1.1)  on  the  segment 
[1/M,  M]  also  takes  place  in  the  case  when,  as  t  — >  00,  the  quantity  M  =  M(t) 
grows  unboundedly  slowly  enough. 


Theorem  A6.1.2  (Integral  representation)  A  function  L(t)  is  an  s.v.f.  as  t  — >  00  if 
and  only  if  for  some  to  >  0,  one  has 


L(t)  =  c(t )  exp 


t>t0, 


(A6.1.4) 


where  the  functions  c(t)  and  s(t)  are  measurable  and  such  that  c(t)  — >  c  e  (0,  00) 
and  s(t )  — >  0  as  t  ^  00. 


For  instance,  for  L(t)  =  In  t  representation  (A6.1.4)  is  valid  with  c(t)  =  1 ,  to  =  e 
and  s(t)  =  (hM)-1. 


Proof  of  Theorem  A6. 1.1  Put 

h(x)  :=  In L(ex).  (A6.1.5) 


Then  property  (A6.1.1)  of  s.v.f.s  is  equivalent,  for  each  u  e  R,  to  the  condition  that 
the  convergence 

h(x  +  u)  —  h(x)  — >  0  (A6.1.6) 


takes  place  as  v  — >  00.  To  prove  the  theorem,  we  need  to  show  that  this  convergence 
is  uniform  in  u  e  [u\,  uf\  for  any  fixed  u\  e  R.  In  order  to  do  that,  it  suffices  to 
verify  that  convergence  (A6.1.6)  is  uniform  on  the  segment  [0,  1].  Indeed,  from  the 
obvious  inequality 


h(x  +  u\  +  U2)  —  h(x) 


(A6.1.7) 


we  have 


h(x  +  u)  —  h(x)  <(u2  —  u\  +  l)  sup  h(x  +  y)  —  h(x) 

y€[0,i] 


u  e[u  1,  U2] 


6.1  General  Properties  of  Regularly  Varying  Functions 


667 


For  a  given  s  e  (0,  1)  and  an  v  >  0,  set  Ix  :=  [x,  x  +  2], 


[u  e  Ix  :  h(u)  —  h(x)  |  >  e/2}. 


u  e  /q  :  h(x  +  u)  —  h(x) 


Clearly,  the  sets  /*  and  7q  are  measurable  and  differ  from  each  other  by  a  transla¬ 
tion  by  v,  so  that  /!,(/*)  =  /i(Iq  ),  where  /*,  is  the  Lebesgue  measure.  By  (A6.1.6) 
the  indicator  function  of  the  set  /q  converges,  at  each  point  m  e  To,  to  0  as  v  ->  oo. 
Therefore,  by  the  dominated  convergence  theorem,  the  integral  of  this  function,  be¬ 
ing  equal  to  /*,(/q  ),  converges  to  0,  so  that  /*,(/*)  <  s/2  for  v  >  xo,  where  vo  is 
large  enough. 

Further,  for  s  e  [0,  1],  the  segment  Ix  Pi  Ix+S  =  [x  +s,  x  +  2\  has  length  2  — s  >  1 , 
so  that,  for  x  >xq,  the  set 


(Ix  n  /,+,)  \  (4*  u  4V) 

has  measure  >  1  —  e  >  0  and  hence  is  non-empty.  Let  y  be  a  point  from  this  set. 
Then 


h(x  +  s)  —  h(x)  <  h(x  +  s)  —  h(y)  +  h(y)  —  h(x)  <  s/2  +  e/2  =  s 


for  x  >  vo,  which  proves  the  required  uniformity  on  [0,  1]  and  hence  on  any  fixed 
segment.  The  theorem  is  proved.  □ 


Proof  of  Theorem  A6.1.2  The  fact  that  the  right-hand  side  of  (A6.1.4)  is  an  s.v.f.  is 
almost  obvious:  for  any  fixed  v  ^  1, 


L(vt) 

~Lf) 


where  c(vt)/c(t)  — >►  c/c  =  1  and,  as  t  — >  oo. 


/ 


vt 


s(u ) 


vt 


du 


u 


—  o(lnv)  =  o(  1), 


(A6.1.8) 


(A6.1.9) 


We  now  prove  that  any  s.v.f.  admits  the  representation  (A6.1.4).  The  required  rep¬ 
resentation  in  terms  of  the  function  (A6.1.5)  is  equivalent  (after  substituting  t  =  ex) 
to  the  relation 

h(x)  =  d(x)+  f  8(y)dy ,  (A6.1.10) 

Jxo 

where  d(x)  =  In c(ex)  — >  d  e  R  and  5(jc)  =  s(ex)  — >  0  as  jc  — >  oo,  vo  =  Info- 
Therefore  it  suffices  to  establish  representation  (A6.1.10)  for  the  function  h(x). 

First  of  all  note  that  h(x)  (as  well  as  L{t))  is  a  “locally  bounded”  function.  In¬ 
deed,  Theorem  A6.1.1  implies  that,  for  vq  large  enough  and  all  x  >  xq. 


<  1 


sup 

0<y<l 


h(x  +  y)  —  h(x) 
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Hence,  for  any  v  >  xq,  we  have  by  virtue  of  (A6.1.7)  the  bound 


h(x )  —  h(x  o) 


<  X 


VQ  +  1. 


Further,  the  local  boundedness  and  measurability  of  the  function  h  mean  that  it  is 
locally  integrable  on  [vq,  oo)  and  hence  can  be  represented  for  v  >  vq  as 


PX  o+l  r  1  rx 

h(x)  =  h(y)  dy  +  I  (h(x)-h(x  +  y))dy  +  (h(y  +  1)  -  h(y))  dy . 

J  X  o  J  0  J  Xq 


(A6.1.11) 

The  first  integral  in  (A6.1.1 1)  is  a  constant,  which  will  be  denoted  by  d.  The  second 
integral,  by  virtue  of  Theorem  A6.1.1,  converges  to  zero  as  v  — >  oo,  so  that 


d(x):=d-\-  /  (h{x)  —  h(x  +  y))  dy  —>  d,  x  — >  oo. 

Jo 

As  for  the  third  integral  in  (A6.1.11),  by  the  definition  of  an  s.v.f.,  the  integrand 
satisfies 


8(y):=h(y  +  l)-h(y)^0 

as  y  — >►  oo,  which  completes  the  proof  of  representation  (A6.1.10). 


□ 


6.2  The  Basic  Asymptotic  Properties 


In  this  section  we  will  obtain  a  number  of  consequences  of  Theorems  A6.1.1  and 
A6.1.2  that  are  related  to  the  asymptotic  behaviour  of  s.v.f.s  and  r.v.f.s. 


Theorem  A6.2.1  (i)  If  L\  and  L2  are  s.v.f.s  then  L\  +  L2,  L1L2,  Lbx  and  L(t)  := 
L\  (at  +  b),  where  a  >  0  and  b  e  M,  are  also  s.v.f.s 

(ii)  IfL  is  an  s.v.f.  then,  for  any  8  >  0,  there  exists  a  t§  >  0  such  that 

t~8<L(t)<t8  forallt>t§.  (A6.2.1) 


In  other  words ,  L(t)  =  t°^  as  t  —>  00. 

(iii)  IfL  is  an  s.v.f.  then,  for  any  8  >  0  and  vo  >  1,  there  exists  at§>0  such  that, 
for  all  v  >  vq  and  t  >t$. 


v 


-8 


< 


L{vt) 


(A6.2.2) 


(iv)  ( Karamata’s  theorem)  If  an  r.v.f.  V  in  (A6.1.2)  has  exponent  —/3,  /3  >  1,  then 


V[(t) 


tv  it) 

-  as  t  —>  00. 

p-\ 


V  ( u ) du 


(A6.2.3) 
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If  ft  <  1  then 

['  tV(t) 

Vj(t):=  /  V(u)du  ~ -  as  t  oo. 

Jo  1  -  P 

(A6.2.4) 

If  ft  =  1  then 

VI(t)  =  tV(t)Ll(t) 

(A6.2.5) 

and 

[OO 

VI(t)  =  tV(t)L2(t)  if  V(u)du<o o, 

Jo 

(A6.2.6) 

where  L[(t)  —>  oo  as  t  —>  oo,  i  =  1,2,  are  s.v.f.s. 
(v)  For  an  r.v.f.  V  with  exponent  —ft  <  0,  put 


b(t)  :=  l\l/t)  =  inf{w  :  V (u)  <  1/t}. 


Then  b(t)  is  an  r.v.f  with  exponent  1//3: 

II 

■5b 

(A6.2.7) 

where  Lf>  is  an  s.vf  If  the  function  L  possesses  the  property 

L(tLl/(t(t))~  L(t) 

(A6.2.8) 

as  t  —>  oo  then 

(A6.2.9) 

Similar  assertions  hold  for  functions  slowly/regularly  varying  as  t  \  0. 

Note  that  Theorem  A6.1.1  and  inequality  (A6.2.2)  imply  the  following  property 
of  s.v.f.s:  for  any  8  >  0  there  exists  a  t§  >  0  such  that,  for  all  t  and  v  satisfying  the 
inequalities  t  >t§  and  vt  >t§,  we  have 

(1  —  5)  min{i/,  n-5}  <  — - — -  <  (1  +  8)  maxji/,  n_<5}.  (A6.2.10) 

L(t) 

Proof  of  Theorem  A6.2.1  Assertion  (i)  is  evident  (just  note  that,  in  order  to  prove 
the  last  part  of  (i),  one  needs  Theorem  A6.1.1). 

(ii)  This  property  follows  immediately  from  representation  (A6.1.4)  and  the 
bound 


=  o(\nt) 


as  t  —>  oo. 

(iii)  In  order  to  prove  this  property,  notice  that  on  the  right-hand  side  of  (A6.1.8), 
for  any  fixed  8  >  0  and  no  >  1  and  all  t  large  enough,  we  have 


-S/2  <  -a/2  <  C_yn  <  J/2  <  «/2 

-  0  “  c(t)  -  0  - 


V  >  V0, 


V 
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and 

r’e{u) 

I  - du 

Jt  u 

(by  virtue  of  (A6.1.9)).  This  implies  (A6.2.2). 

(iv)  By  the  dominated  convergence  theorem,  we  can  choose  an  M  =  M(t)  — ►  oo 
as  t  — >  oo  such  that  the  convergence  in  (A6.1.1)  will  be  uniform  in  v  e  [1 ,  M]. 
Changing  the  variable  u  =  vt,  we  obtain 


<  -  In  v 
2 


VI(t)=rf>+lL(t)  f  v~PI^—!-dv  =  t-f>+lL(t)  f  +[ 

J 1  L(t)  IJi  Jm 


(A6.2.11) 


If  >  1  then,  as  t  — >  oo, 


M  rM 


pm  p 

h  A 


v  “  dv 


1 


-1’ 


whereas  by  property  (iii),  for  8  =  —  l)/2,  we  have 


'OO 


p  OO  p  OO  /» ( 

/  <  /  v~P+Sdv  =  /  „-(/?+1)/2d„ 

Jm  Jm  Jm 


0. 


These  relations  together  imply 


-0+1 


L(0  = 


- 1 


The  case  /3  <  1  can  be  treated  quite  similarly,  but  taking  into  account  the  uniform 
in  v  e  [1/M,  1]  convergence  in  (A6.1.1)  and  the  equality 


/' 


v  @dv  = - 

0  1  — 


If  /3  =  1  then  the  first  integral  on  the  right-hand  side  of  (A6.2.11)  is 


/M  p  M 

~  J  v~]  dv  =  In  M, 


so  that  if 


poo 

Jo 


V  ( u )  du  <  oo 


(A6.2.12) 


then 


and  hence 


V7(0  >  (1  +o(l))L(f)lnAf  »  L(f) 


r  ^  v!(t )  v7(o 

Lou)  := - = - >oo  as  t  — >  oo. 

*V(0  L(0 


(A6.2.13) 
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Note  now  that,  by  property  (i),  the  function  L2  will  be  an  s.v.f.  whenever  V1  (t)  is 
an  s.v.f.  But,  for  v  >  1, 


/vt 

V(u)du, 

where  the  last  integral  clearly  does  not  exceed  (v  —  l)L(t) (1+^(1)).  By  (A6.2.13) 
this  implies  that  V1  (vt)/  V1  (t)  — >  1  as  t  — >  00,  which  completes  the  proof 
of  (A6.2.6). 

That  relation  (A6.2.5)  is  true  in  the  subcase  when  (A6.2.12)  holds  is  almost  ob¬ 
vious,  since 


nt  n  00 

VI(t)  =  tV(t)Li(t)  =  L(t)Li(t)=  /  V(u)du ->  /  V{u)du , 

Jo  Jo 

so  that,  firstly,  L\  is  an  s.v.f.  by  property  (i)  and,  secondly,  L\(t)  — >►  00  because 
L(O^Oby  (A6.2.13). 

Now  let  p  =  1  and  /0°°  V (w)  dw  =  00.  Then,  as  M  =  M(t)  — >►  00  slowly  enough, 
similarly  to  (A6.2.11)  and  (A6.2.13),  by  the  uniform  convergence  theorem  we  have 


Vi(t)  = 


v  1  L(vt)dv  > 


Therefore  L\(t)  :=  Vj(t)/L(t ) 
above,  we  have,  as  v  e  (0,  1), 


00  as  / 


L(vt )  (in  ~  L(t)  In M  L(t). 

— >  00.  Further,  also  similarly  to  the 


VI(t)  =  VI(vt)+  \  V(u)du , 

where  the  last  integral  does  not  exceed  (1  —  v)L(t)(l  +  o(l))  ^  so  that  Vi(t) 
(as  well  as  L\(t)  by  virtue  of  property  (i))  is  an  s.v.f.  This  completes  the  proof  of 
property  (iv). 

(v)  Clearly,  by  the  uniform  convergence  theorem  the  quantity  b  =  b(t)  is  a  solu¬ 
tion  to  the  “asymptotic  equation” 

1 

V  (b)  ^  -  as  t  — >  00  (A6.2.14) 

(where  the  symbol  ~  can  be  replaced  by  the  equality  sign  if  the  function  V  is 
continuous  and  monotonically  decreasing).  Substituting  t1^  L&(f)  for  b ,  we  obtain 
an  equivalent  relation 

L^L(tl/pLb)  ~  1,  (A6.2.15) 

where  clearly 

tx^L\j  — >  00  as  t  — >►  00. 


(A6.2.16) 


672 


6  Regularly  Varying  Functions 


Fix  an  arbitrary  v  >  0.  Substituting  vt  for  t  in  (A6.2.15)  and  setting,  for  brevity’s 
sake,  L2  =  £>2(0  •=  Lb(vt),  we  get  the  relation 

L^L{tllpL2)  ~  1,  (A6.2.17) 

since  L2)  ~  L(tl^L2)  by  virtue  of  (A6.2.16)  (with  Lb  replaced 

with  L2).  Now  we  will  show  by  contradiction  that  (A6.2.15)-(A6.2.17)  imply  that 
Lb  ~  L2  as  t  — >  00,  which  obviously  means  that  L &  is  an  s.v.f. 

Indeed,  the  contrary  assumption  means  that  there  exist  i>o  >  1  and  a  sequence 
->  00  such  that 


^72  • —  £>2 (0)/ Lb(tn)  '>  ^0?  ^  —  1,2,...  (A6.2.18) 


(the  possible  alternative  case  can  be  dealt  with  in  the  same  way).  Clearly,  t*  := 

tn  ^ Lb (tn)  — >  00  by  (A6.2.16),  so  we  obtain  from  (A6.2.15)-(A6.2.16)  and  property 
(iii)  with  8  =  P/ 2  that 


L2P(t„)L(tJj/fiL2(tn)) 

L-p(tn)L(tln/PLb(tn)) 


< 


~P/2 

Un 


<  V 


-P/2 

0 


<  1. 


We  get  a  contradiction. 

Note  that  the  above  argument  proves  the  uniqueness  (up  to  asymptotic  equiva¬ 
lence)  of  the  solution  to  Eq.  (A6.2.14). 

Finally,  relation  (A6.2.9)  can  be  proved  by  a  direct  verification  of  (A6.2.14)  for 
b  :=  Ll/P using  (A6.2.8),  we  have 


V(b)  =  b~fiL(b)  = 


OAt1 7F) 


L(tW) 
~  tL(tVP) 


1 

t ' 


The  required  assertion  follows  now  by  the  aforementioned  uniqueness  of  the  solu¬ 
tion  to  the  asymptotic  equation  (A6.2.14).  Theorem  A6.2.1  is  proved.  □ 


6.3  The  Asymptotic  Properties  of  the  Transforms  of  R.V.F.s 
(Abel-Type  Theorems) 

For  an  r.v.f.  V  (t),  its  Laplace  transform 

/»00 

\[r(X)\=  I  e~kt  V (t)  dt  <  00 

Jo 

is  defined  for  all  X  >  0.  The  following  asymptotic  relations  hold  true  for  the  trans¬ 
form. 

Theorem  A6.3.1  Assume  that  V  (t)  e  !5H  (i.e.  V  (t)  has  the  form  (A6.1.2)). 


6.3 


The  Asymptotic  Properties  of  the  Transforms  of  R.V.F.s 
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(i)  If  €  [0,  1)  then 

T(  1-/3) 

- —  V(l/A)  as  A|0.  (A6.3.1) 

A 

(ii)  If/3  =  l  and  /0°°  V(t)dt  =  oo 

^(A)  ~  Vj(l/A)  aski  0,  (A6.3.2) 

where  Vi(t )  =  /q  V (w)  r/w  ->  oo  A  an  s.v./.  such  that  Vi(t )  L(f)  as  t  ^  oo. 

(iii)  In  any  case ,  ^(A)  t  V/(oo)  =  /0°°  V(t)dt  <  oo  as  A  |  0. 


Assertions  (i)  and  (ii)  are  called  Abelian  theorems. 
If  we  resolve  relation  (A6.3.1)  for  V  then  we  obtain 


V(t)  - 


WO 

^(1-/0 


as  t 


oo. 


Relations  of  this  type  will  also  be  valid  in  the  case  when,  instead  of  the  regularity 
of  the  function  V,  one  requires  the  monotonicity  of  V  and  assumes  that  is  an 
r.v.f.  as  k  |  0.  Statements  of  such  type  are  called  Tauberian  theorems.  We  will  not 
need  these  theorems  and  so  will  not  dwell  on  them. 


Proof  of  Theorem  A6.3.1  (i)  For  any  fixed  s  >  0  we  have 


where,  for  the  first  integral  on  the  right-hand  side,  for  <  1,  by  virtue  of  (A6.2.4) 
we  have  the  following  relation 


e~ktV(t)dt  < 


V  ( t )  dt  ~ 


sV(s/k) 

M1-/0 


as  A  |  0. 


(A6.3.3) 


Changing  the  variable  kt  =  u,  we  can  rewrite  the  second  integral  in  the  above  rep¬ 
resentation  for  ffk)  as 


V(l/k) 

A  Js 


L(u/k) 

L(l/k) 


v(  1A) 

A 


(A6.3.4) 


Each  of  the  integrals  on  the  right-hand  side  converges,  as  A  |  0,  to  the  corresponding 
integral  of  e~uu~^:  the  former  integral  converges  by  the  uniform  convergence  the¬ 
orem  (the  convergence  L(u / k) / L(\ / k)  — >  1  is  uniform  in  u  e  [ £ ,  2]),  and  the  latter 
converges  by  virtue  of  (A6.1.1)  and  the  dominated  convergence  theorem,  since  by 
Theorem  A6. 2.1  (iii),  for  all  A  small  enough,  we  have  L(w/A)/L(l/A)  <  u  fora  >  2. 
Therefore, 


y(i/A)  r°° 
-  /  u 

A  Js 


e  u  du. 


(A6.3.5) 
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Now  note  that,  as  A  |  0, 

eV(e/k)  /VOA)  =  i-gL(e/k)  w 

A  /  A  *  L(l/A)^£ 

Since  £  >  0  can  be  chosen  arbitrarily  small,  this  relation  together  with  (A6.3.3)  and 
(A6.3.5)  completes  the  proof  of  (A6.3.1). 

(ii)  Integrating  by  parts  and  changing  the  variable  A t  =  u,  we  obtain,  for  /3  =  1 
and  M  >  0,  that 


= 


e~x,dVj{t )  = 


Vi(t)de~kt 


Vj(u/X)e  Udu  = 


(A6.3.6) 


By  Theorem  A6.2.1(iv),  V/(t)  L(t )  is  an  s.v.f.  as  t  — >  oo.  Therefore,  by  the 
uniform  convergence  theorem,  for  M  =  AT  (A)  — >►  oo  slowly  enough  as  A  — >►  0,  the 
middle  integral  on  the  right-hand  side  of  (A6.3.6)  is 


V/  (1/A) 


l 


M 


Vi(u/  A) 


l/M  Vz  (1/A) 


pM 

Udu  -  V7(l/A)  /  -  V7(l/A), 

J  1/M 


The  remaining  two  integrals  are  negligibly  small:  since  V/(7)  is  an  increasing  func¬ 
tion,  the  first  integral  does  not  exceed  V7(1/AM)/M  =  o(V7(l/A)),  while  for  the 
last  integral  we  have  by  Theorem  A6.2.  l(iii)  that 


Vz  (1/A) 


/ 


00  V/O/A.)  „ 

7  e  <  V/(l/A) 


M  V/(1A) 


POO 

I  ue~ndu  =  tf(V/(l/A)) 

J  M 


Hence  (ii)  is  proved.  Assertion  (iii)  is  evident. 


□ 


6.4  Subexponential  Distributions  and  Their  Properties 

Let  §,  §i ,  §2>  •  •  •  be  independent  identically  distributed  random  variables  with  distri¬ 
bution  F,  and  let  the  right  tail  of  this  distribution 

F+(t)  :=F([f,oo))  =  P(£  >0,  teR, 

be  an  r.v.f.  as  t  — >►  oo  of  the  form  (A6.1.2),  which  we  will  denote  by  V(t).  Recall 
that  we  denoted  the  class  of  all  such  distributions  by  !R. 

In  this  section  we  will  introduce  one  more  class  of  distributions,  which  is  sub¬ 
stantially  wider  than  3L 

Let  f  e  R  be  a  random  variable  with  distribution  G:  G(£)  =  P(f  e  B)  for  any 
Borel  set  B  (recall  that  in  this  case  we  write  f  £=  G).  Denote  by  G(t )  the  right  tail 
of  the  distribution  of  the  random  variable  f : 


G{t)  :=P(f  >  0,  ^  M. 
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The  convolution  of  tails  G\(t)  and  G2OO  is  the  function 

G(t-y)dG2(y)  =  J  Gx(t  -  y)G2(dy)  =  P(Z2  >  /), 

where  Z2  =  £1  +  £2  is  the  sum  of  independent  random  variables  £  €=  G;,  i  =  1, 2. 
Clearly,  Gi  *  G2OO  =  G2  *  Gi(t).  Denote  by  G2*(t)  :=  G  *  G(t)  the  convolution 
of  the  tail  G(t)  with  itself  and  put  G^+1^*(t)  :=  G  *  Gn*(t),  n  >2. 

Definition  A6.4.1  A  distribution  G  on  [0,  00)  belongs  to  the  class  S+  of  subexpo¬ 
nential  distributions  on  the  positive  half-line  if 

G2*(t)  ~  2G(0  asf^oo.  (A6.4.1) 

A  distribution  G  on  the  whole  line  (—00,  00)  belongs  to  the  class  S  of  subexponen¬ 
tial  distributions  if  the  distribution  G+  of  the  positive  part  £+  =  max{0,  £}  of  the 
random  variable  f  ^  G  belongs  to  S+.  A  random  variable  is  called  subexponential 
if  its  distribution  is  subexponential. 

As  we  will  see  below  (Theorem  A6.4.3),  the  subexponentiality  property  of  a  dis¬ 
tribution  G  is  essentially  the  property  of  the  asymptotics  of  the  tail  G(t)  as  t  — >►  00. 
Therefore  we  can  also  speak  about  subexponential  functions. 

A  nondecreasing  function  G\(t)  on  (0,  00)  is  called  sub  exponential  if  a  distri¬ 
bution  G  with  the  tail  G(t)  ~  cG\(t)  as  t  — >►  00  with  some  c  >  0  is  subexpo- 
nential.  (For  example,  distributions  with  the  tails  G(t)  =  G\(t)/G\(0)  or  G(t)  = 
min(l,  Gi(0)). 

Remark  A6 .4.1  Since  we  obviously  always  have 

(G+)2*(0  =  P(?1+  +  ?+  >  t)  >  P({?+  >  4  U  {f +  >  t}) 

=  P(4Ti  >t)  +  Pfe  >0  -  P(fi  >t^2>  t) 

=  2 G(t)  -  G2{t)  =  2G+(r)(l  +  o(l)) 
as  f  >  00,  subexponentiality  is  equivalent  to  the  following  property: 

<  2.  (A6.4.2) 

Note  also  that,  since  relation  (A6.4.1)  makes  sense  only  when  G(t)  >0  for  all  t  e  R, 
the  support  of  any  subexponential  distribution  is  unbounded  from  the  right. 

We  show  that  regularly  varying  distributions  are  subexponential,  i.e.,  that 
Let  and  P(§  >t)  =  V  ( t )  be  r.v.f.s.  We  need  to  show  that 

P($i  +  Hi  >  X)  =  V2*(x)  :=V*  V(x) 


(G+)2*(t) 
lim  sup - ; - 

r^oo  G+(t) 


Gi  *  G2(?)  :=  — 


- 


V(x-t)dV(t)~  2V(x). 


(A6.4.3) 
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In  order  to  do  that,  we  introduce  events  A  :=  {§1  +  §2  >  *}  and  B[  :=  {§*•  <  jt/2}, 
i  =  1,2.  Clearly, 

P(A)  =  P(ABi)  +  P(A52)  -  P(A5i£2)  +  P(A5i52), 

where  P(AZ?iZ?2)  =  0,  P(AZ?iZ?2)  =  P(Z?iZ?2)  =  V2(x/2)  (here  and  in  what  fol¬ 
lows,  5  denotes  the  event  complementary  to  B)  and 

rx/ 2 

P(A#i)  =  P(A£2)  =  /  V(x-0F(^)- 

J —00 


Therefore 


9  fx/2 

V2*(x)  =  2  V(x 

J  —OO 


t)F(dt)  +  V2(x/2). 


(A6.4.4) 


(The  same  result  can  be  obtained  by  integrating  the  convolution  in  (A6.4.3)  by 
parts.)  It  remains  to  note  that  V2(x/2)  =  o(V (jc))  and 


/. x/2  p—M  p  M  px/ 2 

V(x-t)F(dt)  =  +  +  , 

-oo  J —oo  J—M  JM 


(A6.4.5) 


where,  as  one  can  easily  see,  for  any  M  =  M(x)  ^  o o  as  x  ^  oo  such  that  M 
o(x),  we  have 


pM 

r-M 

rx/ 2 

/  ~  V (x)  and 

+ 

J-M 

J  —oo 

JM 

which  proves  (A6.4.3). 

The  same  argument  is  valid  for  distributions  with  a  right  tail  of  the  form 


F+{t)=e-,PL{t\  fi  €  (0, 1), 


(A6.4.6) 


where  L(t)  is  an  s.v.f.  as  t  — >  oo  satisfying  a  certain  smoothness  condition  (for 
instance,  that  L  is  differentiable  with  Z/(0  =  o(L(t)/t )  as  t  — >►  oo). 

One  of  the  basic  properties  of  subexponential  distributions  G  is  that  their  tails 
G(0  are  asymptotically  locally  constant  in  the  following  sense. 


Definition  A6.4.2  We  will  call  a  function  G(t)  >  0  (asymptotically)  locally  con¬ 
stant  (l.c.)  if,  for  any  fixed  v. 


G(t  +  v) 
G(t ) 


1  as  t  — >  oo. 


(A6.4.7) 


In  the  literature,  distributions  with  l.c.  tails  are  often  referred  to  as  long-tailed 
distributions;  however,  we  feel  that  the  term  “locally  constant  function”  better  re¬ 
flects  the  meaning  of  the  concept.  Denote  the  class  of  all  distributions  G  with  l.c. 
tails  G(t )  by  L. 

For  future  reference,  we  will  state  the  basic  properties  of  l.c.  functions  as  a  sepa¬ 
rate  theorem. 
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Theorem  A6.4.1  (i)  For  an  l.c.  function  G(t )  the  convergence  in  (A6.4.7)  is  uni¬ 
form  in  v  on  any  fixed  finite  interval. 

(ii)  A function  G(t )  is  l.c.  if  and  only  if  for  some  to  >  0,  it  admits  a  representation 
of  the  form 


G(t)  =  c(t)  exp 


/' 

Jt0 


s(u)  du 


t>t0, 


(A6.4.8) 


where  the  functions  c(t)  and  s(t)  are  measurable  and  such  that  c(t)  — >  c  e  (0,  oo) 
and  s(t)  — ^  0  as  t  ^  oo. 

(iii)  IfG\(t)  and  G2O)  are  l.c.  functions  then  G\(t)  +  G2(t ),  Gi(/l)G2(0>  Gj(f), 
and  G(t)  :=  G\{at  +  b),  where  a  >  0  and  b  e  R,  arc  a  At?  l.c. 

(iv)  If  G(t)  is  an  l.c.  function  then,  for  any  s  >  0, 


e£t G(t)  -^00  as  £ 


00, 


In  other  words ,  <my  l.c.  function  G(t)  can  be  represented  as 

G(t)  =  e~l(J\  l{t)  =  o(t)  as  t  — >  00.  (A6.4.9) 


(v)  Let 


g'w:= 


G(a)  du  <0 o 


aad  at  /cast  oac  of  the  following  conditions  be  satisfied : 

(a)  G(t)  A  aa  l.c.  function',  or 

(b)  G7(t)  A  aa  l.c.  function  and  G(t)  is  monotone. 

Then 


G{t)=o(GI{t))  as  t  —>  00. 

(vi)  T^G  e  A  t/zca  G2*(t)  ~  (G+)2*(t)  as  t  ^  00. 


(A6.4.10) 


Remark  A6. 4. 2  Assertion  (i)  of  the  theorem  implies  that  the  uniform  convergence 
in  (A6.4.7)  on  the  interval  [— M,  M ]  persists  in  the  case  when,  as  t  ->  00,  M  =  M(t) 
grows  unboundedly  slowly  enough. 


Proof  of  Theorem  A6.4.1,  (i)-(iii)  It  is  clear  from  Definitions  A6.4.1  and  A6.4.2 
that  G(t)  is  l.c.  if  and  only  if  L(t)  :=  G(lnt)  is  an  s.v.f.  Having  made  this  observa¬ 
tion,  assertion  (i)  follows  directly  from  Theorem  A6.1.1  (on  uniform  convergence 
of  s.v.f. s),  while  assertions  (ii)  and  (iii)  follow  from  Theorems  A6.1.2  and  A6.2.1(i), 
respectively. 

Assertion  (iv)  follows  from  the  integral  representation  (A6.4.8). 

(v)  If  (a)  holds  then,  for  any  M  >  0  and  all  t  large  enough, 

rt+M  y 

GI(t)>  G(u)du  >  -MG(t). 

Jt  2 
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Since  M  is  arbitrary,  G1  (t)  ^  G(t).  Further,  if  (b)  holds  then 


G(t)  < 
G'{t)  ~ 


G(u)  du  ~ 


GpJ-  1) 

G1  (t) 


0 


as  t  oo. 

(vi)  Let  and  £2  be  independent  copies  of  a  random  variable  £ ,  Z2  :=  fi  +  £2, 
Z^  :=  £  +  +  £2".  Clearly,  £  <  f-+,  so  that 

G2*(t)  =  P(Z2  >t)<  P(Z2+)  >  t)  =  (G+)2*(t)-  (A6.4.11) 

On  the  other  hand,  for  any  M  >  0, 


2 

G2*(t)  >  P(Z2  >  f,  fi  >  0,  C2  >  0)  +  J>(Z2  >  t,  G  G  [— M,  0]), 

Z  =  1 

where  the  first  term  on  the  right-hand  side  is  equal  to  P(Z2+)  >  t,  ^  >0,  ^  >  0), 
and  the  last  two  terms  can  be  bounded  as  follows:  since  G  e  £,  then,  for  any  s  >  0 
and  M  and  t  large  enough, 

P(z2  >  t,  0  €  [-M,  0])  >  P(&  >  t  +  M,  0  €  [-M,  0]) 

G(f  +  Af)r 

=  G(0  ;[P(0  <  0)  -  P(0  <  — M)] 

G(t) 

>  (1  -  e)G(r)P(?+  =  0)  =  (1  -  e)P(Z<+)  >  f,  =  0). 
Thus  we  obtain  for  G2*(t)  the  lower  bound 

2 

G2*(t)  >  P (Z<+)  >  t,  f+  >  0,  f  +  >  0)  +  (1  -  e)  J>(zi,+)  >  t,  ?+  =  0) 

Z  =  1 

>  (1  -  e)P(z‘+)  >  f)  =  (1  -  e)(G*f\t). 

Therefore  (vi)  is  proved,  as  s  can  be  arbitrarily  small.  The  theorem  is  proved.  □ 

We  return  now  to  our  discussion  of  subexponential  distributions.  First  of  all,  we 
turn  to  the  relationship  between  the  classes  S  and  L . 

Theorem  A6.4.2  We  have  S  C  £,  and  hence  all  the  assertions  of  Theorem  A6.4.1 
are  valid  for  sub  exponential  distributions  as  well. 

Remark  A6.4.3  The  coinage  of  the  term  “subexponential  distribution”  was  appar¬ 
ently  due  mostly  to  the  fact  that  the  tail  of  such  a  distribution  decreases  as  t  — >  oo 
slower  than  any  exponential  function  e~£t ,  as  shown  in  Theorems  A6.4.1(iv)  and 
A6.4.2. 
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Remark  A6.4. 4  In  the  case  when  the  distribution  G  is  not  concentrated  on  [0,  oo), 
the  tails’  additivity  condition  (A6.4.1)  alone  is  not  sufficient  for  the  function  G(t ) 
to  be  l.c.  (and  hence  for  ensuring  the  “subexponential  decay”  of  the  distribution  tail, 
cf.  Remark  A6.4.3).  This  explains  the  necessity  of  defining  subexponentiality  in  the 
general  case  in  terms  of  condition  (A6.4.1)  on  the  distribution  G+  of  the  random 
variable  f  +  .  Actually,  as  we  will  see  below  (Corollary  A6.4.1),  the  subexponential¬ 
ity  of  a  distribution  G  on  R  is  equivalent  to  the  combination  of  conditions  (A6.4.1) 
(on  G  itself)  and  G  e  £ . 

The  next  example  shows  that,  for  random  variables  assuming  values  of  both 
signs ,  condition  (A6.4.1),  generally  speaking,  does  not  imply  the  subexponential 
behaviour  of  G(t). 

Example  A6.4.1  Let  /x  >  0  be  fixed  and  the  right  tail  of  the  distribution  G  have  the 
form 

G(t)  =  e~^V(t),  (A6.4.12) 

where  V  (t)  is  an  r.v.f.  vanishing  as  t  — >  oo  and  such  that 

/oo 

e'IyG (dy)  <  oo. 

-OO 

Similarly  to  (A6.4.4)  and  (A6.4.5),  we  have 

ft/2 

G2*(t )  =  2  G(t  -  y) G (dy)  +  G2(t/ 2), 

2—00 


where 


ft/ 2  ft/2 

/  G(t-y)G(dy)  =  e~iM  e^Vit  -  y)G(dy) 

2  —  00  2—oo 


=  e~IJ-t 


/—M  n  M  ft/2 

+  +  / 

-oo  J-M  JM  J 


One  can  easily  see  that,  for  M  =  M(t)  oo  slowly  enough  as  t  — >  oo,  we  have 


/ivi  p—ivi  /*r/z 

e^Vit -y)G(dy)  ~g(ii)V(t),  /  +  /  =o(G(t)), 

-M  J- oo  JM 


while 

G2(f/2)  =  e~^'V2(t/ 2)  <  ce~^V2{t)  =  o(G(t)). 

Thus,  we  obtain 

G2*(t)  ~  2 g(n)e~^V(t)  =  2 g{n)G{t),  (A6.4.13) 

and  it  is  clear  that  we  can  always  find  a  distribution  G  (with  a  negative  mean)  such 
that  g(/z)  =  1.  In  that  case  relation  (A6.4.1)  from  the  definition  of  subexponentiality 
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will  be  satisfied,  although  G(t )  decreases  exponentially  fast  and  hence  is  not  an  l.c. 
function. 

On  the  other  hand,  note  that  the  class  of  distributions  satisfying  relation  (A6.4.1) 
only  is  an  extension  of  the  class  S.  Distributions  in  the  former  class  possess  many 
of  the  properties  of  distributions  from  S. 


Proof  of  Theorem  A6.4.2  We  have  to  prove  that  S  C  £.  Since  the  definitions  of  both 
classes  are  given  in  terms  of  the  right  distribution  tails,  we  can  assume  without  loss 
of  generality,  that  G  e  8+  (or  just  consider  the  distribution  G+).  For  independent 
(nonnegative)  £  ^  G  we  have,  for  t  >  0, 

G2*(t)  =  P(fi  +  ft  >  0  =  P(ft  >t)+  P(ft  +  ft  >  t,  ft  <  t) 

=  G(t)+  [  G(t-y)G(dy).  (A6.4.14) 

Jo 

Since  G(t)  is  non-increasing  and  G(0)  =  1,  it  follows  that,  for  t  >  v  >  0, 


G2*{t) 
Git ) 


Gjt  -  y) 
Git) 


Git-y) 

G{t) 


G  idy) 


>  1  +  [1  -  G(v)]  +  G(*V)  [G(u)  -  G(t)]. 

Lr(t) 


Therefore,  for  t  large  enough  (such  that  G(v )  —  G(t)  >  0), 


G(t  -  v) 

1  <  — - -  < 

"  G{t)  ~ 


1 


r  n2* 


Gz*(t) 


G(v)  —  G(t)  |_  Gf) 


2  +  Giv) 


Since  G  g  S+,  the  right-hand  side  of  the  last  formula  converges  as  t  — >  oo  to  the 
quantity  G(n) /G(v)  =  1  and  hence  G  e  £.  The  theorem  is  proved.  □ 

The  next  theorem  contains  several  important  properties  of  subexponential  distri¬ 
butions. 


Theorem  A6.4.3  Let  G  e  S. 

(i)  If  Gift)/ Gf)  —>  Ci  as  t  —>  oo,  q  >  0,  i  =  1, 2,  c\  +  C2  >  0,  then 

G\  *  G2(0  ^  Gi(0  +  G2(0  ^  (ci  +  c2)Git). 

(ii)  If  Go  it)  ~  cG(t)  as  t  ^  oo,  c  >  0,  Go  G  S. 

(iii)  For  any  fixed  n  >  2, 


Gn*it)^nGf)  as  t  — >  oo. 


(A6.4.15) 


(iv)  For  any  e  >  0  a  b  =  bis)  <  oo  such  that 


Gn*f) 

Git) 


<bi\+s) 


n 


for  all  n  >2  and  t . 
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In  addition  to  assertions  (i)  and  (ii)  of  the  theorem,  we  can  also  show  that  if  G  e  S 
and  the  function  mf)  e  £  possesses  the  property 

0  <  m\  <  mf)  <  m2  <  oo 


then  G\f)  =  mf)Gf)  e  S. 

Theorems  A6.4.1(vi),  A6.4.2  and  A6.4.3(iii)  imply  the  following  simple  state¬ 
ment  elucidating  the  subexponentiality  condition  for  random  variables  taking  values 
of  both  signs. 

Corollary  A6.4.1  A  distribution  G  belongs  to  §  if  and  only  if  G  e  L  and  G2*(t)  ~ 
2Gf)  as  t  —>  oo. 


Remark  A6. 4. 5  Evidently  the  asymptotic  relation  G\f)  ~  G2O)  as  t  ->  00  is  an 
equivalence  relation  on  the  set  of  distributions  on  R.  Theorem  A6.4.3(ii)  means 
that  the  class  S  is  closed  with  respect  to  that  equivalence.  One  can  easily  see  that  in 
each  of  the  equivalence  subclasses  of  the  class  §  with  respect  to  this  relation  there 
is  always  a  distribution  with  an  arbitrarily  smooth  tail  Gf). 


Indeed,  let  pit)  be  an  infinitely  differentiable  probability  density  on  R  vanishing 
outside  [0,  1]  (we  can  take,  e.g.,  p(x)  =  c  •  e-1/G(i-*))  if  x  e  (0,  1)  and  p(x)  =  0 
if  x  £  (0,  1)).  Now  we  “smooth”  the  function  lit)  :=  —  In  Gf),  G  e  8,  putting 


J  p(t  —  u)liu)  du , 


and  let  Gq(0  :=  e  l°^\ 


(A6.4.16) 


Clearly,  Go(0  is  an  infinitely  differentiable  function  and,  since  If)  is  nondecreasing 
and  we  actually  integrate  over  [t  —  1 ,  t]  only,  one  has  lit  —  1)  <  W)  <  If)  and 
hence  by  Theorem  A6.4.2 


Go(r)  Gf  —  1) 

1  < - < - >  1  as  t 


Gf) 


Gf) 


00. 


Thus,  the  distribution  Go  is  equivalent  to  the  original  G.  A  simpler  smoothing  pro¬ 
cedure  leading  to  a  less  smooth  asymptotically  equivalent  tail  consists  of  replacing 
the  function  If)  with  its  linear  interpolation  with  nodes  at  points  (k,  Iff),  k  being 
an  integer. 

Therefore,  up  to  a  summand  oil),  we  can  always  assume  the  function  If)  = 
—  In  Gf),  G  g  8,  to  be  arbitrarily  smooth. 

The  aforesaid  is  clearly  applicable  to  the  class  L  as  well:  it  is  also  closed  with 
respect  to  the  introduced  equivalence,  and  each  of  its  equivalence  subclass  contains 
arbitrarily  smooth  representatives. 


Remark  A6 A. 6  Theorem  A6.4.3(ii)  and  (iii)  immediately  implies  that  if  G  e  8  then 
also  G”*  g§,  w  =  2,3, _ Moreover,  if  we  denote  by  G'7V  the  distribution  of  the 
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maximum  of  independent  identically  distributed  random  variables  f  i , . . . ,  €=  G, 

then  the  evident  relation 

Gny  {t)  =  \  -  G(t))n  ~  nG(t)  as  t  ^  oo  (A6.4.17) 

and  Theorem  A6.4.3(ii)  imply  that  G"v  also  belongs  to  §. 

Relations  (A6.4.17)  and  (A6.4.15)  show  that,  in  the  case  of  a  subexponential 
G,  the  tail  Gn*(t )  of  the  distribution  of  the  sum  of  a  fixed  number  n  of  indepen¬ 
dent  identically  distributed  random  variables  £  €=  G  is  asymptotically  equivalent 
(as  t  — >  oo )  to  the  tail  GnV(t)  of  the  maximum  of  these  random  variables,  i.e.,  the 
“large”  values  of  this  sum  are  mainly  due  to  by  the  presence  of  one  “large”  term  & 
in  the  sum.  It  is  easy  to  see  that  this  property  is  characteristic  of  subexponentiality. 

Remark  A6.4.7  Note  also  that  an  assertion  converse  to  what  was  stated  at  the  be¬ 
ginning  of  Remark  A6.4.6  is  also  valid:  if  Gn*  e  S  for  some  n  >  2  then  G  e  S 
as  well.  That  G77V  e  S  implies  G  e  8  evidently  follows  from  (A6.4.17)  and  Theo¬ 
rem  A6.4.3(ii). 


Proof  of  Theorem  A6.4.3  (i)  First  assume  that  cic2  >  0  and  that  both  distributions 
G  i  are  concentrated  on  [0,  oo).  Fix  an  arbitrary  s  >  0  and  choose  M  large  enough 
to  have  G/  (M)  <  s,i  =  1,2,  and  G(M)  <  s,  and  such  that,  for  t  >  M, 


£)Ci  < 


Gj(t) 
G(t ) 


c  (1  +  e)ci , 


t  G(t-M )  n  , 

1  —  £  <  -  <  1  +  £ 

Gif ) 

(A6.4.18) 


(the  last  inequality  holds  by  virtue  of  Theorem  A6.4.2). 

Let  f  €=  G  and  &  ^  G/,  i  =  1, 2,  be  independent  random  variables.  Then,  for 
t  >  2M,  we  have  the  representation 

G\  *  G2(0  =  Pi  +  Pi  +  P3  +  Pa,  (A6.4.19) 

where 

P\  ■=  P(?i  >  t  -  h,  Ki  e  [0,  M)), 

Pi  '■=  P(?2  >  t  -  fi,  Ci  e  [0,  M)), 

P3  :=P(?2  >t-Su  ?i  e  [M,  t  -  M)), 

Pa  :=  P(^2  >  M,  ^1  >  t  -  M) 


(see  Fig.  A.l). 

We  show  that  the  first  two  terms  on  the  right-hand  side  of  (A6.4.19)  are  asymp¬ 
totically  equivalent  to  c\G(t)  and  c2G(0,  respectively,  while  the  last  two  terms  are 
negligibly  small  compared  with  G(t).  Indeed,  for  P\  we  have  the  obvious  two-sided 
bounds 
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Fig.  A.l  Illustration  to  the 
proof  of  Theorem  A6.4.3, 
showing  the  regions  Pf , 
z  =  1,  2,  3, 4 


(1  -  sfci  G(t)  <  Gi(0(l  -  G2(M))  =  P(?1  >  t,  ft  e  [0,  M)) 

<  Pi  <P(fr  >t  -  M)  =  G\(t  —  M)  <  (1  +  e)2c\G(t) 


by  (A6.4.18);  the  term  P2  can  be  bounded  in  a  similar  way.  Further, 


Pa  =  Pfe  >  M,  fi  >  t  -  M)  =  G2(M)Gi(t  -  M)  <  e(l  +  sfc2G(t). 

It  remains  to  estimate  P2  (note  that  it  is  here  that  we  will  need  the  condition  G  e  S; 
so  far  we  have  only  used  the  fact  that  G  e  £).  We  have 


pi=f 


G2(t  —  y)  Gi (dy)  <  (1  +  s)c2  f  G(t  -  y)G\(dy), 
l— M)  J[M,t-M) 

(A6.4.20) 

where  it  is  clear  that,  by  (A6.4.18),  the  last  integral  is  equal  to 


P(£  +  £l  >t,  fi  6  [M,  t  —  M)) 

=  P  (i;  >t  -  M,  he  [M,  t  -M))+  P(C  +  n  >t,£  e[M,t 

=  G(t-M)Gi([M,t-M))+  [  Gl(t-y)G(dy) 

J[M,t~M ) 


-M)) 


<  £(1  +  s)G(t)  +  (1  +  s)c\ 


L 


- M ) 

G(t  -  y)G(dy). 


(A6.4.21) 


Now  note  that  similarly  to  the  above  argument  we  can  easily  obtain  (setting 
G\  =  G2  =  G)  that 


G2*(t)  =  (1  +  6\s)2G{t)  + 


/ 


- M ) 


G(t-y)G(dy)  +  e(l+e2e)G(t), 


where  \6j  \  <  1,  i  =  1,  2.  Since  G2*(f)  ~  2G(t)  by  virtue  of  G  e  S+,  this  equality 
means  that  the  integral  on  the  right-hand  side  is  o(G(t)).  Now  (A6.4.21)  immedi¬ 
ately  implies  that  also  P2  =  o(G(t )),  and  hence  the  required  assertion  is  established 
for  the  case  G  e  S+. 
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To  extend  the  desired  result  to  the  case  of  distributions  G/  on  R,  it  suffices  to 
repeat  the  argument  from  the  proof  of  Theorem  A6.4.1(vi). 

The  case  when  one  of  the  c*  can  be  zero  can  be  reduced  to  the  case  c\C2  >  0, 
which  has  already  been  considered.  If,  say,  c\  =  0  and  C2  >  0,  then  we  can  introduce 
the  distribution  Gi  :=  (Gi  +  G)/2,  for  which  clearly  G\(t)/ G{t)  — >  c\  =  1/2,  and 
hence  by  the  already  proved  assertion,  as  t  — >  oo, 


1 

— |-  c?  ~ 
2 


Gi*G2(0  Gi*G2(0  +  G*G2(0 


LrV) 


Gi*G2(0  ,  v 

^G  +  <1  +  0(1» 


1  +  C2 


so  that  Gi  *  G2(t)/G(t)  — >►  C2  =  ci  +  C2. 

(ii)  Denote  by  Gq  the  distribution  of  the  random  variable  £q~,  where  fo  €=  Go- 
Since  Gq  (t)  =  Go(t)  for  t  >  0,  it  follows  immediately  from  (i)  with  Gi  =  G2  =  G^ 
that  (G+)2*(0  -  2G+(0,  i.e.  G0  €  8. 

(iii)  If  G  e  8  then  by  Theorems  A6.4.1(vi)  and  A6.4.2  we  have,  as  t  — >  00, 

G2*(t)  ~  (G+)2*(0  ~  2G(t). 


Now  relation  (A6.4.15)  follows  immediately  from  (i)  by  induction. 

(iv)  Similarly  to  (A6.4.11),  we  have  Gn*(t)  <  G+* (t),n>  1.  Therefore  it  is  clear 
that  it  suffices  to  consider  the  case  GeS+.Put 


Oin 


:=  sup 

t>  0 


Gn*(0 
G(0  ' 


Similarly  to  (A6.4.14),  for  n  >  2,  we  have 


G**(0  = 


and  hence,  for  each  M  >  0, 


G(f)  +  f  G(n-l)*(t-y)G(dy), 
Jo 


■t 


<  1+  sup  /  - ^7^ — —  G(Jy) 


0<t<M  JO 


G(t ) 


+ 


sup  / 

t>M  J0 


t  G(n-D*(t_y)G(t_y) 


<  1  + 


1 


— — +a„_i  SUP 

G(AU  t>M 


G(t  —  y)  G(t) 

G2*(t)  -  G(t ) 


G  (<:/>’) 


G(f) 


Since  G  G  8,  for  any  £  >  0  there  exists  an  M  =  Af  (e)  such  that 

G2*(0  -  G(f)  ,  , 

sup - <  1  +  £ 

r>M  G(0 
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and  hence 


oin  <  +  &n— l  (1  +  £)>  £>o  : — 1  +  1  /G(M),  oi\  —  1. 

This  recurrently  implies 


n  —  1 

0in  <  ^0  +  fe0(l  +  £)  +  2(1  +  £)2  <  •  •  •  <  ^  '(1  +  £)^' 

J=0 


< 


&0 

£ 


(l+ef. 


The  theorem  is  proved. 


□ 


Appendix  7 

The  Proofs  of  Theorems  on  Convergence 
to  Stable  Laws 


In  this  appendix  we  will  prove  Theorems  8.8. 1-8. 8.4. 

7.1  The  Integral  Limit  Theorem 

In  this  section  we  will  prove  Theorem  8.8.1  on  convergence  of  the  distributions  of 
normalised  sums  Sn  =  J2k=\  £&  t0  stable  taws.  Recall  the  basic  notation: 

F+(r):=P(£>f),  F_(r):=P($<-r), 

F0(t)  :=  F+(t)  +  F-(t)  =  P($  i  [-t,  0). 

The  main  condition  used  in  the  theorem  has  this  form: 

[R*P]  The  total  tail  Fq(x)  =  F-(x)  +  F+(x)  is  a  r.v.f  as  x  ->  oo,  i.e.,  can  be 
represented  as 

F0(x)  =  t-PLF()(x),  j3  e  (0, 2],  (A7.1.1) 

where  Lp0(x )  fs  an  s.v./,  and  limit 

P+  :=  lim  €  [0, 1],  p  :=  2p+  -  1.  (A7.1.2) 

A^OO  Fq(X) 

In  the  case  /3  <  2  we  put 

*(«):=  F0(_1)(l/«),  (A7.1.3) 

while  for  /3  =  2  we  set 

b(n):=Y(~l)(l/n),  (A7.1.4) 

where 

no-*-1  [‘ yF<,(y)Jy  =  r2E(S2;-'  <S  <l)  =  r2Ly(,),  (A7.1.5) 
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Ly(t)  is  an  s.v.f.,  so  that  (see  Theorem  A6.2.1(v)  of  Appendix  6) 

b(n)  =  nl^a Lb(n),  Lt>  is  an  s.v.f. 

In  the  case  when  F+(t)  and  F-(t)  are  regularly  varying  functions  (for  instance, 
when  condition  [R^p]  is  satisfied  and  p  =  0),  we  will  denote  these  functions 
by  V  {t)  and  W(t),  respectively,  and  put 


rt  poo 

Vi(t):=  J  V( y)dy,  V1  (t)  :=  J  V(y)dy; 

the  same  notational  convention  will  be  used  for  W. 

If  F+{t)  =  o(Fo(t))  as  t  — >  oo  (p  =  —  1),  then  F+(t)  is  not  necessarily  a  regu¬ 
larly  varying  function,  but  everything  we  say  below  regarding  the  sums  V(t)  +  W(t) 
and  V1  (t) +  W1  {t)  remains  valid  if  we  understand  by  their  first  summands  quantities 
negligibly  small  compared  to  the  second  summands  (the  first  summands  can  also  be 
replaced  by  zeros).  This  is  also  true  for  the  sums  Vj  (t)  +  Wj(t ),  except  for  the  case 
when  Emax(0,  §)  exists  and  V/(t)  has  to  be  replaced  by  E(§ ;  §  >  0)  +  o(  1). 


Theorem  A7.1.1  Let  condition  [R^p]  be  satisfied  and  t,n  := 


b{n ) 


(i)  For  p  e  (0,  2),  /3  1,  and  scaling  factor  (A7.1.3),  as  n  — >  00, 


?#i=K 


(P,p) 


(A7.1.6) 


where  the  distribution  of  the  random  variable  f  depends  on  the  parameters 

ft  and  p  only  and  has  chfi 


<p(fi'p\t)  :=  =  exp{\tf  B(/3,  p, #)}, 


where  d  :=  signf, 


B(P,p,d):=r(l 


ipd  sin 


fit 

~Y 


(A7.1.7) 


(A7.1.8) 


and,  for  ft  g  (1,  2),  we  assume  that  F(  1  —  fi)  =  r(2  —  /3)/(l  —  ft)- 

(ii)  Wzen  f>  =  l,  for  the  sequence  with  scaling  factor  (A7.1.3)  to  converge  to 
a  limiting  law  the  former ;  generally  speaking ,  needs  to  be  centred.  More  precisely , 
we  dave,  as  n  ^  00, 

(A7.1.9) 

where 

An  :=  ^L[V/(&(n))  -  W/(i(n))]  -  PC,  (A7.1.10) 

C  ^  0.5112  is  the  Euler  constant ,  and 


(i,p) 


7Tfi 


ipt  In  |f|  . 


^(1.P)(p  :=EPrf 


=  exp 


(A7.1.11) 
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If  n[Vj(b(n))  —  Wj(b(n))]  =  o(b(n )),  then  p  =  0  and  we  can  put  An  =  0. 

If  there  exists  E§  =  0  then 

An  =  -f^)[^,{b(n))-VI{b(n))}-pC. 

im = 0  and  p  /  0  then  pAn  —>  — oo  as  n  —>  oo. 

(iii)  For  ft  =  2  and  scaling  factor  (A7.1.4),  as  n  ^  oo, 

t;n  =>►  £(2’^,  <p(2’p\t)  :=  Ee*^  =  e_r/2, 

so  f/iaf  /zos  /7z£  standard  normal  distribution  which  does  not  depend  on  p. 


Proof  We  will  use  the  same  approach  as  in  the  proof  of  the  central  limit  theorem 
using  relation  (8.8.1).  We  will  study  the  asymptotic  properties  of  the  ch.f.  (pit)  = 
TLelt ^  in  the  vicinity  of  zero  (more  precisely,  the  asymptotics  of 


<P 


0 


as  b(n)  oo)  and  show  that,  under  condition  [Rp?/0],  for  each  /z  e  R,  we  have 


ft 


as  n 


oo 


(A7.1.12) 


(or  some  modification  of  this  relation,  see  (A7.1.48)).  This  will  imply  that,  for  = 
Sin)/b{n),  as  n  — >  oo,  there  holds  the  relation  (cf.  Lemma  8.3.2) 


(A7.1.13) 


Indeed, 


<Ptn  O)  =  V'1 


Pi 


b(n) 


Since  <p(t)  — >►  1  as  t  ->  0,  one  has 


In  (/i)  =  n  In  <p 


li 


bfi) 


—  n  In 


i  +  [<p 


M 


£>(ft) 


-  1 


=  /I 


<P 


P 


b(n) 


-  1 


+  ^77  . 


where  |RW  |  <  n  \(p(pi/b(n))  —  1 12  for  all  n  large  enough,  and  hence  Rn  — >  0  by  virtue 
of  (A7.1.12).  It  follows  that  (A7.1.12)  implies  (A7.1.13). 

So  first  we  will  study  the  asymptotics  of  (pit)  as  t  0  and  then  estab¬ 
lish  (A7.1.12). 
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(i)  First  let  /3  E  (0,  1).  We  have 


'OO 


poo  rn 

(p(t)  =  -  ei,xdV{x)-\  e~ltxdW(x) 

Jo  Jo 


Consider  the  former  integral 


■oo  poo 

ItX  JT7/„\  T//A\  I  I  Jtx 


poo 

/  e“ 

Jo 


dV(x)  =  V  (0)  +  i 


"i 


e  V  (x)  dx, 


where  the  substitution  \t\x  =  y,  \t  \  =  l/m  yields 


poo  poo 

I+(t)\=it  I  eltxV(x)dx  =  id  /  ell^yV(my)dy, 

Jo  Jo 


Therefore  it  is  natural  to  expect  that,  as  \t\  — >  0, 


'OO 


r 

I+(t)  ~  idV  (m)  I  el^yy~^dy  =  idV(m)A(/3,d), 

Jo 


where 


(A7.1.14) 


(A7.1.15) 


(A7.1.16) 


d  =  signf  (we  will  henceforth  exclude  the  trivial  case  t  =  0). 

Assume  for  the  present  that  p+  >  0.  Then  V  (x)  is  an  r.v.f.  as  v  — >  oo  and,  for 
each  y,  by  virtue  of  the  properties  of  s.v.f.s  we  have,  as  \m  \  — >  0, 

V  (my)  —y-Py  (m). 


(A7.1.17) 


poo 

A(fi, ■&)■.=  /  ei9yy~pdy.  (A7.1.18) 

Jo 

Assume  that  relation  (A7.1.17)  holds  and  similarly  (in  the  case  when  p-  >  0) 


L 


oo 


e~itxdW(x)  =  W(0)  +  /_(f), 


(A7.1.19) 


where 


I_(t)  :=  -it 


e  ltxW(x)dx 


—idW(m) 


Wyy~P dy 


=  - idW(m)A(P ,  -d).  (A7.1.20) 

Since  V(0)  +  W(0)  =  1,  relations  (A7.1.14)-(A7.1.20)  mean  that,  as  t  — >►  0, 

<9(0  =  1  +  F0(m)W  [p+A{p, &)  -  p-A{p,  -0)](1  +  o(l)).  (A7.1.21) 

We  can  find  an  explicit  form  of  the  integral  A(/3,d).  Observe  that  the  integral 
along  the  boundary  of  the  positive  quadrant  (closed  as  a  contour)  in  the  complex 
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plane  of  the  function  elzz  ^ ,  which,  as  \t\  — >  0,  is  equal  to  zero.  From  this  it  is  not 
hard  to  obtain  that 

A(/3,  ft)  =  F(  1  -  p)ew{l~^)7T/2,  p  >  0.  (A7.1.22) 


(Note  also  that  (A7.1.18)  is  a  table  integral  and  its  value  can  be  found  in  handbooks, 
see,  e.g.,  integrals  3.761.4  and  3.761.9  in  [18].) 

Thus,  in  (A7.1.21)  one  has 


ift [p+A(0,  ft)  -  p-A(P,  -!?)]  =  ift  r (1  -  p) 


P- |_  cos 


(1  -P)n 


(1  -6)jt  (1  -B)jt  (1  -B)n 

+  i  ftp+  sin - p —  cos - 1-  i  ftp-  sin - 


=  m-p) 


=  m-p) 


,  (1  -  J8)7T  .  (1  —  P)tt 

lft(p+  —  P-)  cos - sin - 


pit  fin 

i  ftp  sin - cos  — 


=  B(P,p,ft), 


where  B(/3,  p,  ft)  is  defined  in  (A7.1.8).  Hence,  as  t  — >  0, 


V(t)  -  1  =  F0(m)B(P,  p,  ft)(l  +  o(l)) 


(A7.1.23) 


Putting  t  =  pi/b(n)  (so  that  m  =  b(n)/\/i\),  where  b(n)  is  defined  in  (A7.1.3),  and 
taking  into  account  that  Fo(b(n))  ~  l/n,  we  obtain 


n 


\b\ pB(p,  p,  ft). 


(A7.1.24) 

We  have  established  the  validity  of  (A7.1.12)  and  therefore  that  of  assertion  (i)  of 
the  theorem  in  the  case  ft  <  1,  p+  >  0. 

If  p+  =  0  (p-  =  0)  then,  as  was  already  mentioned,  the  above  argument  remains 
valid  if  we  replace  V (m)  (W (m))  by  zero.  This  follows  from  the  fact  that  in  this 
case  F+(t)  (F-(t))  admits  a  regularly  varying  majorant  V*  (t)  =  o(W (t))  (W*  (t)  = 

o(vm. 

It  remains  only  to  justify  the  asymptotic  equivalence  in  (A7.1.17).  To  do  that,  it 
is  sufficient  to  verify  that  the  integrals 


eWyV(my)dy , 


el®y  V  (my)  dy 


(A7.1.25) 


can  be  made  arbitrarily  small  compared  to  V (m)  by  choosing  appropriate  s  and  M. 
Note  first  that  by  Theorem  A6.2.  l(iii)  of  Appendix  6  (see  (A6.1.2)  in  Appendix  6), 
for  any  8  >  0,  there  exists  an  x§  >  0  such  that,  for  all  v  <  1  and  vx  >  xs,  we  have 


V  (roc) 
V(x) 


<  (1  +8)v~P-s. 
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Therefore,  for  8  <  1  —  p  and  x  >  x$ 


L 


x 


L 


x  rl  y (Vx) 


V(u)du<x§+  /  V (u)  du  =  x§  +  xV (x) 


X§  JX§/x  V  (X) 


. 


dv 


<  x$  +  iP(i)(1  +  5)  /  v  P  sdv 
xV(x)(l  +  <5) 


f 


—  *8  + 


1  -6-8 


<  cxV (x) 


(A7.1.26) 


since  xV (x)  ^  oo  as  v  — >  oo.  It  follows  that 


f 


el®yV (my)  dy 


< 


i  rem  i_« 

—  /  V (u) du  <  csV (sm)  ^  cs  pV(m ) 

m  Jo 


Since  s1  ^  >  0  as  e  — >  0,  the  first  assertion  in  (A7.1.25)  is  proved.  The  second 
integral  in  (A7.1.25)  is  equal  to 

p  OO  1  OO  1  />00 

/  e'®yV(my)dy  = —elf>yV(my)  /  el9ydV{my) 

Jm  m  Jm 

i  i  r00 

=  -—el9MV{mM)~  —  /  el9u/mdV{u), 
id  id  JmM 


so  its  absolute  value  does  not  exceed 


2 V (mM)  ~  2M~P  V (m) 


(A7.1.27) 


as  m  — >►  oo.  Hence  the  value  of  the  second  integral  in  (A7.1.25)  can  also  be  made  ar¬ 
bitrarily  small  compared  to  V (m)  by  choosing  an  appropriate  M.  Relation  (A7.1.17) 
together  with  the  assertion  of  the  theorem  in  the  case  p  <  1  are  proved. 

Let  now  p  e  (1,2)  and  hence  there  exist  a  finite  expectation  E§  which,  according 
to  our  condition,  will  be  assumed  to  be  equal  to  zero.  In  this  case, 


(Pit)  -  1  = 


=  */ 


\t\ 


cpf(du)du,  d  =  signt, 


(A7.1.28) 


and  we  have  to  find  the  asymptotic  behaviour  of 


<p'(t)  =  -i  f  xe,lxdV(x)  +  i  I  xe~lIxdW(x)=:I]!\t)  +  riJ(t)  (A7.1.29) 
Jo  Jo 

as  t  — >►  0.  Since  xdV(x)  =  d(xV(x))  —  V (x)  dx ,  integration  by  parts  yields 


(X) 


itx 


l 


oo 


-itx 


(1) 


(1) 


41}(0  :=  ~i 


poo  poo  poo 

I  xeltxdV(x)  =  —  i  I  eltxd(xV(x)^  +  i  I  eltxV(x)dx 

Jo  Jo  Jo 


poo  poo 

=  —t  I  xV (x)eltx  dx  +  iV1  (0)  —  t  I  V1  (x)eltx  dx 

Jo  Jo 
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l 


oo 


=  iV‘(0)-tl  V(x)eitxdx, 


(A7.1.30) 


where,  by  Theorem  A6.2.1(iv)  of  Appendix  6,  both  functions 


V 


C  x  V  ( x ) 

1  (x)  :=  I  V (u)  du  ~  — — —  as  jc  ^  oo,  y7(0)  <  oo, 

J X 


- 1 


and 


~  j  fixV  (x) 

V  (x)  :=  xV (x)  +  V1  (x)  ~  — — — 


are  regularly  varying. 

Letting,  as  before,  m  =  l/\t\,  m  ^  oo  (cf.  (A7.1.16),  (A7.1.17)),  we  get 


l 


oo 


—t  I  V(x)eltxdx  =  — 


^  1*00  ^ 

dV(m)  I  V (my)ell^y  dy 
JO 

-d  f  y-P+'e^y  dy  =  -  A(P-1,&), 

Jo  t{fi-  1) 


/+'(/)  =  iv\ 0)  -  -  l, &)(l  +  o(D),  (A7.1.31) 

where  the  function  A(/3,d)  defined  in  (A7.1.18)  is  equal  to  (A7.1.22). 

Similarly, 


I-\t) : 


/•OO 

=  i  / 

Jo 

l 


te~itx  dW(x) 


OO 


-t  I  xW(x)e~ltxdx-iW‘( 0) 

'o 


L 


oo 


t  I  W1  (x)e  ltx  dx 

'o 


-iW1(0)-t  [  W(x)e~llxdx, 

Jo 


oo 


—  itX 


where 


,  f°°  ~  ,  fixW(x') 

W\x):=  W(u)du,  W(x)\=xW(x)  +  w\x)~- 

J  X 


-1  ' 


and 


r  ~  Hr  pw(m) 

-t  /  W(  x)  e~ltx  dx  ~  -  /N  A(j6  -  1,  -!?). 

Jo 


*08-1) 


Therefore 


L1}(0  =  *w7(0)  -  -  1,  -0)(1  +0(1)) 
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and  hence,  by  virtue  of  (A7.1.29),  (A7.1.31),  and  the  equality  V7(0)  —  W7(0)  = 
E£  =  0,  we  have 

<p'(t)  =  -^F/™\[P+A(I3  -1,0)  +  P-A(f3  -  1,  -0)](l  +  o(l)). 

^  vP  i) 

We  return  now  to  relation  (A7.1.28).  Since 


r\t\ 

/  u~]  Fo(u  *)  du  ~p  1F0(\t\  x)  =  /3~l Fo(m) 

Jo 

(see  Theorem  A6.2. l(iii)  of  Appendix  6),  we  obtain,  again  using  (A7.1.22)  and  an 
argument  similar  to  the  one  in  the  proof  for  the  case  /3  <  1 ,  that 


(Pit)  -  1  =  - 


1 


— -  F0(m)  [/ o+A(j8  -  1,  0)  +  p-A(P  -  1,  -0)](l  +  o(l)) 


r(  2-p) 


- 1 


F0(m) 


(2-I3)ji  .  (2-13)71 

p+ 1  cos - b  i  v  sin - 


(2-/3)  7t  .  (2-f3)n 

+  p_  cos - i  u  sin - 


r(  2-p) 


- 1 


F0(m) 


/3jt  /3ti 

cos - iftp  sin  — 


(l  +o(l» 


(1+0(1)) 


=  F0(m)B(P,  p,  !?)(l  +o(l)). 


(A7.1.32) 


We  arrive  once  again  at  relation  (A7.1.23)  which,  by  virtue  of  (A7.1.24),  implies  the 
assertion  of  the  theorem  for  /3  e  (1,2). 

(ii)  Case  (3  =  1 .  In  this  case,  the  computation  is  somewhat  more  complicated.  We 
again  follow  relations  (A7.1.14)-(A7.1.16),  according  to  which 


(pit)  —  1  +  /+(0  + 1  ~{t). 


(A7.1.33) 


Rewrite  expression  (A7.1.16)  for  i+  it)  as 


noo  poo  poo 

I+ix)  =  i&  I  el^yVimy)dy  =  ift  I  V imy)  cos  y  dy  —  I  V imy)  siny  dy, 

Jo  Jo  Jo 

(A7.1.34) 

where  the  first  integral  on  the  right-hand  side  can  be  represented  as  the  sum  of  two 
integrals: 


P 1  POO 

/  Vimy)dy+  /  giy)V imy)  dy, 
Jo  Jo 


(A7.1.35) 


cos  y  —  1 
cosy 


if  y<l, 
if  y  >  1. 


(A7.1.36) 
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Note  that  (see,  e.g.,  integral  3.782  in  [18])  the  value  of  the  integral 


poo 

Jo 


g(y)y-ldy  =  C*  0.5772 


(A7.1.37) 


is  the  Euler  constant.  Since  V  ( ym ) /V(m)  — >  y  1  as  m  ^  oo,  similarly  to  the  above 
argument  we  obtain  for  the  second  integral  in  (A7.1.35)  the  relation 


g(y)V  (my)  dy 


(A7.1.38) 


Consider  now  the  first  integral  in  (A7.1.35): 


f 


V  (my  )  dy  =  m 


'L 


m 


V (u)  du  =  m 


(A7.1.39) 


where 

Vj(x)  :=  f  V(u)du  (A7.1.40) 

Jo 

can  easily  be  seen  to  be  an  s.v.f.  in  the  case  /3  =  1  (see  Theorem  A6.2.1(iv)  of 
Appendix  6).  Here  if  E|§  |  =  oo  then  Vj (v)  — >  oo  as  v  — >  oo,  and  if  E|£  |  <  oo  then 
V/(x)  — >►  V7(oo)  <  oo. 

Thus,  for  the  first  term  on  the  right-hand  side  of  (A7.1.34)  we  have 

Im  I+(t)  =  #(— CV(m)  +  m~l V/(m))  +  o(V(m)).  (A7.1.41) 


Now  we  will  determine  how  Vj(vx)  depends  on  r  as  r  ^  oo.  For  any  fixed  v  >  0, 


/ 


vx  ,v  y  (yx') 


Vj(vx)  =  V;(x)  +  /  V(u)du  =  Vj(x)  +  xV  (x) 


L 


V(x) 


dy. 


By  Theorem  A6.2.1  of  Appendix  6, 


/ 


V(I) 


(ij 


l 


V(yx)  [v  dy 


y 


=  \nv, 


so  that 

Vi(vx)  =  V>(jc)  +  (l  +  6>(l))xV(x)lni;  =:  Ay(r,r)  +  xV(x)lnv,  (A7.1.42) 
where  evidently 


Ay(v,  x)  =  V7(x)  +  o(xV  (x))  as  v  — >  oo  (A7.1.43) 


and  V7(x)  xV(r)  by  Theorem  A6.2.1(iv)  of  Appendix  6. 
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Therefore,  for  t  =  ii/b(n)  (so  that  m  =  b(ri)/\ii\  and  hence  V (m)  ~  p+\/j,\/n), 
we  obtain  from  (A7.1.41)  and  (A7.1.42)  (where  one  has  to  put  x  =  b(n),  v  =  1/\/jl\) 
that  the  following  representation  is  valid  as  n  ->  oo: 


Im  /+(/  )  =  — C - 1- 


n  bin) 

b'  4  /l  1—1  7  /  \  \  P+M 


Av(ImI  *,*(«))  -  ^-ln|Ml 


+  <9(n  ]) 


Z?(/l) 


Ay(|/xr1,Z?(n))  -  ^-(C  +  ln|/x|)  +o{n~l).  (A7.1.44) 


For  the  second  term  on  the  right-hand  side  of  (A7.1.34)  we  have 


-L 


oo 


R e/+(0  =  —  /  V(ray)  sinydy  ~  —  V(m) 


/*00 

/ 

Jo 


sin  ydy. 


Because  sin  3;  ~  y  as  y  — >  0,  the  last  integral  converges.  Since  C(y)  ~  1/y  as  y 
0,  the  value  of  this  integral  can  be  found  to  be  (see  (A7.1.22)  and  (A7.1.22)) 


yjc  tc 
lim  r(y)  sin  —  =  — 
y— >0  r  2  2 


(A7.1.45) 


Thus,  for  t  =  /jl 1  bin). 


7t\a\  ,  _ix 

Re  l+(t)  =  — - - \-o(n  *) 

2  n 


(A7.1.46) 


In  a  similar  way  we  can  find  an  asymptotic  representation  for  the  integral  I -it) 
(see  ( A7 .1.1 4)-( A7 . 1 . 20)) : 

I -it)  :=  — 


poo 

:=  —id  /  W{my)e~l®ydy 

Jo 


pOO  po o 

=  —id  /  W (my)  cos  y  dy  —  I  W (my)  siny  dy. 

Jo  Jo 

Comparing  this  with  (A7.1.34)  and  the  subsequent  computation  of  /+  {t ),  we  can 
immediately  conclude  that,  for  t  =  ji/bin)  (cf.  (A7.1.44),  (A7.1.46)), 


+  Pd±{c+ i„|(l|) +„(„-■), 


Re  I -it)  =  - 


bin) 

n\n\p-  ,  ,  _! 


17 


(A7.1.47) 


2/i 


+  o(n  ) 


Thus  we  obtain  from  (A7.1.33),  (A7.1.44)  and  (A7.1.46)  that  (A7.1.47)  imply 


<P 


bin) 


1  =  _^_w(c  +  lnM) 


n 


n 


+ 


I/jL 

bin) 


[Ay(|/x|  \b(n))  -  Aw(\n\  \bin))]+o(n 


-1 


-1 
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It  follows  from  (A7.1.43)  that  the  penultimate  term  here  is  equal  to 


I/jL 

b(n) 


[Vj(b(n))  -  Wi{b(n))]  +o(n  '), 


so  that  finally, 


,  M  \  .  7r|Ml  IPH.  .  ,  ,  .  An  ,  /  -U 


(A7.1.48) 


where 


tI/?  — 


n 


b(n) 


[Vi  (b(n))~  W,(b(n))]  -  pC. 


Therefore,  similarly  to  (A7.1.12)  and  (A7.1.13),  we  obtain 


n„_^)  =  e-'>%T^)=exp 


—il±An  +  n  In 


i  +  L 


fl 


b(n) 


-  1 


=  exp 


-ill An  +n  \  (p 


li 


b{n) 


—  1  ) +  nO 


<P 


li 


b(n) 


-  1 


As,  for  ft  =  1,  by  Theorem  A6.2.1(iv)  of  Appendix  6,  the  functions  Vj  and  Wi  are 
slowly  varying,  by  (A7.1.48)  one  has 


n 


<P 


fi 


b(n) 


-1 


<C[  -  +  £- 

n  n 


1  1 


<c\  -  + 


n  b(n) 


[Vi(b(n)f  +  WI(b(n))2] )  -»0. 


Since  clearly 


—iliAn  +  n 


ip/i  In  | /x  | , 


we  have 

7 r  |/x| 

— - ip [i  In  |/x|  k 

so  relation  (A7.1.9)  is  proved.  The  subsequent  assertions  regarding  the  centring  se¬ 
quence  {An}  are  evident.  □ 


nn-An(ji)  -*  exp- 


(iii)  It  remains  to  consider  the  case  ft  =  2.  We  will  follow  representations 
(A7.1.28)-(A7.1.30),  according  to  which  we  have  to  find,  as  m  =  \/\t  \  — >►  oo,  the 
asymptotics  of 

<9,(0  =  /|1)(0  +  /i1)(0,  (A7.1.49) 

where 


41}(f)  :=  iV1  (0)  —  t 


V(x)eitxdx  =  iVI( 0)  -  d 


V (my)el^y  dy 

(A7.1.50) 
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and,  by  Theorem  A6.2.1(iv)  of  Appendix  6, 


roo  ~ 

VI(x)  =  /  V(y)dy  ~  xV(x),  V (x)  =  xV(x)  +  V1  (x)  ~  2xV (x)  (A7.1.51) 
J X 


as  x  —>  oo.  Further, 


I  V  (my)  ell2y dy  =  I  V (my)  cos  y  dy  +  d  I  V (my)  sin  y  dy.  (A7.1.52) 

Jo  Jo  Jo 

Here  the  second  integral  on  the  right-hand  side  is  asymptotically  equivalent,  as 
m  — >  oo,  to  (see  (A7.1.45)) 


V(m) 


poo 

/  r1 

Jo 


71 


sin  y  dy  =  —  V(m). 
2 


The  first  integral  on  the  right-hand  side  of  (A7.1.52)  is  equal  to 


pi  poo 

1  1  r 

/  V(my)dy+  /  g(y)V(my)dy, 

Jo  Jo 


where  the  function  g(y)  was  defined  in  (A7.1.35),  and 


/' 


\  ~  1  ~ 

V (my)  dy  =  —  /  V (u)  du  =  —  Vj  (m), 

o  m  Jo  rn 


Vi(x)  :=  Jq  V(u)du  being  an  s.v.f.  by  (A7.1.51).  Since 


fx  ,  x2V(x)  1  fx  , 

/  uV  (u)  du  =  -  - 

Jo 


f  uzdV(u), 

2  2  Jo 

rx  nx 

I  V1  (u)  du  =  xV1  (x)  +  /  uV(u)du 

Jo  Jo 


and  V1  (x)  we  have 


Vj(x)=  (  (uV (u)  +  V1  (w)) du 

Jo 

=  xV1  (x) -\- x2V (x)  —  f  u2 dV (u) 

Jo 

=  —  f  u2  dV  (y)  +  0(x2V(x)), 

Jo 


where  the  last  term  is  negligibly  small,  because 


r 

Jo 


uV(u)du  x2V  (x) 


(A7.1.53) 


(see  Theorem  A6.2.1(iv)  of  Appendix  6). 
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It  is  also  clear  that,  as  v  ->  oo, 

V,(x)  ->  V,(oo)  =  E(+  I  >  0)  e  (0,  oo]. 

As  a  result,  we  obtain  (see  also  (A7.1.38)) 

T+f)  =  i  V '  (0)  -  —  V(m)  -  tV,(m)  +  VCV(m)  +  o(V(m)) 

=  iVI(0)-tV,(m)(l+o(l)) 

since  Vj(x)  tV(x ). 

Quite  similarly  we  get 

/i°(r)  =  -iW'(O)  -«W'/(m)(l+o(l)), 

where  Wj  is  an  s.v.f.  which  is  obtained  from  the  function  W  in  the  same  way  as  V7 
from  V.  Since  V7(0)  =  IV7  (0),  relation  (A7.1.49)  now  yields  that 

(p\t)  =  -t[Vi(m)  +  W/(m)](  1  +<9(1)). 


Hence  from  (A7.1.28)  we  obtain  the  representation 


n  i/m  n  i/m  ^  ^ 

cp(t)  —  l  =  ft  I  cp'(ftu)du  =  —  /  w[V7(1/m)  +  Wj(\/u)\  du 

Jo  Jo 

[V7(m)  +  W7(ra)]  ~  —  - — 2^(^2’  ~m  —  ?  <  m) 


2  m2 


by  virtue  of  (A7.1.53)  and  a  similar  relation  for  Wj.  Turning  now  to  the  definition 
of  the  function  Y(x)  =  x~2Ly(x)  in  (A7.1.5)  and  putting 


b(n):=Y^  t  =  [i/b(n), 


we  get 


n 


n(<p(t )  -  1)  ~  ~  Y(b(n)/\n\)  ~  Y(b(n ))  - 


/JL* 


The  theorem  is  proved. 


□ 


7.2  The  Integro-Local  and  Local  Limit  Theorems 

In  this  section  we  will  prove  Theorems  8. 8. 2-8. 8. 4.  We  will  begin  with  the  integro- 
local  theorem. 
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Theorem  A7.2.1  (Integro-local  Stone’s  theorem)  Let  §  be  a  non-lattice  random 
variable  and  the  conditions  of  Theorem  A7.1.1  be  satisfied.  Then,  for  each  fixed 
A>  0, 


p (sn  g  a[x))  =  —  f(p,p)  ( — A  _|_  ( — A  as  qq 

V  '  b{n)  J  \b(n) )  \b(n) ) 

where  the  remainder  term  o(-^ y)  is  uniform  in  x. 

Proof  of  Theorem  A7. 2.1  The  Proof  is  analogous  to  the  proof  of  Theorem  8.7.1.  We 
will  again  use  the  smoothing  approach  and  consider,  along  with  the  sums  Sn,  the 
sums 


Zn  —  Sn  +  0  rj , 

where  0  =  const  and  rj  is  chosen  so  that  its  ch.f.  is  equal  to  0  outside  a  fi¬ 
nite  interval.  For  instance,  we  can  choose  ij  as  in  Sect.  8.7.3,  i.e.,  with  the  ch.f. 
(Prjit)  =  max(0,  1  —  \t\).  Then  equality  (8.7.19)  will  still  be  valid  with  the  same  de¬ 
composition  of  the  integral  on  its  right-hand  side  into  the  subintegral  I\  over  the 
domain  \t\  <  y  and  h  over  the  domain  y  <  \t\  <  1/0.  Here  estimating  I2  can  be 
done  in  the  same  way  as  in  Theorem  8.7.1. 

For  the  sake  of  brevity,  put  (pit)  :=  <prjAit)<pOnit)-  Then,  for  the  integral  I\  with 
x  =  vbin),  we  have 

l\  =  f  e~ltx  <pn  it)(pit)  dt  = -  [  e~iuv(pn ( - - \du. 

j\t\<y  bin)  J\u\<yb(n)  ^  \b(n) y  \b(n)  J 

(A7.2.1) 


As  was  shown  in  the  proof  of  Theorem  8.1.1,  for  each  u  we  have 


<P 


n 


U 


bin) 


(p{P'p\u)  as  n  00, 


and,  moreover,  for  some  c  >  0  and  y  >  0  small  enough,  by,  virtue  of,  say,  (A7.1.23) 
and  (A7.1.32),  we  have 


Re(<K0  -  1)  <  -cF0(2j, 


and,  for  any  s  >  0  and  all  n  large  enough, 


n  Re  I  (p 


u 


-  1  )  <  -cnF0(— )  <  -c\uf~£. 
bin)  J-  T  \u\  )  ~ 


Here  we  used  the  properties  of  the  r.v.f.  Fq.  Moreover, 


u 


bin) 


1  as  n  00,  (piu)/bin )  <  1. 
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The  above  also  implies  that,  for  all  u  such  that  \u\  <  yb(n), 


<P 


n 


U 


b(n ) 


<  e 


—c  \u 


—  £ 


(A7.2.2) 


The  obtained  relations  mean  that  we  can  use  the  dominated  convergence  theorem 
in  (A7.2.1)  which  implies 


lim  b(n)I\  =  / 

ft— ^00  J 


,-iuv  (P,p) 


(, u )  du 


(A7.2.3) 


uniformly  in  u,  since  the  right-hand  side  of  (A7.2.1)  is  uniformly  continuous  in  v. 
On  the  right-hand  side  of  (A7.2.3)  is  the  result  of  the  application  of  the  inversion 
formula  (up  to  the  factor  l/2n)  to  the  ch.f.  cp^a,p\  This  means  that 

lim  b(n)I\  =  2Ttf^,p\v). 

ft — >-00 

We  have  established  that,  for  v  =  vb(n),  as  n  — >►  00, 


p (z„  e  A[x))  = 


A  f(P,p) 1  x 


b(n) 


b(n) 


+  0 


1 


b(n ) 


uniformly  in  n  (and  hence  in  x). 

To  prove  the  theorem  it  remains  to  use  Lemma  8.7.1 
The  theorem  is  proved. 


□ 


The  proofs  of  the  local  Theorems  8.8.3  and  8.8.4  can  be  obtained  by  an  obvious 
similar  modification  of  the  proofs  of  Theorems  8.7.2  and  8.7.3  under  the  conditions 
of  Theorem  8.8.1. 


Appendix  8 

Upper  and  Lower  Bounds  for  the  Distributions 
of  the  Sums  and  the  Maxima  of  the  Sums 
of  Independent  Random  Variables 


Let  §1,  §2>  •  •  •  be  independent  identically  distributed  random  variables, 


Sn  —  y  ]  Hi  > 

i  =  l 


Sn  =  max  Si; . 

l<k<n 


The  main  goal  of  this  appendix  is  to  obtain  upper  and  lower  bounds  for  the  proba¬ 
bilities  P (Sn  >  x)  and  P (Sn  >  x).  These  bounds  were  used  in  Sect.  9.5  to  find  the 
asymptotics  of  the  probabilities  of  large  deviations  for  Sn  and  Sn . 


8.1  Upper  Bounds  Under  the  Cramer  Condition 

In  this  section  we  will  assume  that  the  following  one-sided  Cramer  condition  is  met: 

[C]  There  exists  a  X  >  0  such  that 

t^(A)  =  <  oo.  (A8.1.1) 

The  following  analogue  of  the  exponential  Chebyshev  inequality  holds  true  for 

P(S„>*). 

Theorem  A8.1.1  For  all  n  >  1,  x  >  0  and  X  >  0,  we  have 

P (s„  >x)<  e~kx  max(l,  \j/n{X)).  (A8.1.2) 

Proof  As  rj(x)  :=  inf{^  >  1  :  Sk  >  x}  <  oo  is  a  Markov  time,  the  event  {17 (x)  =  k } 
is  independent  of  the  random  variables  Sn  —  Sk.  Therefore 

n  n 

irn(X)  =  Eeu"  >  rj(x)=k )  >  ^ E  (eA(A:+s""s‘);  rj(x)  =  k) 

k= 1  k=  1 


A.A.  Borovkov,  Probability  Theory ,  Universitext, 
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n 

=  ekx^2\l/n~k(X)P(r](x)=k)  >  ekx  min(l,  i/rn(X))P(Sn  >  x). 
k=  1 

This  immediately  implies  (A8.1.2).  The  theorem  is  proved.  □ 

If  i/s(X)  >  1  for  A  >  0  (this  is  always  the  case  if  there  exists  >  0)  then  the 
right-hand  side  of  (A8.1.2)  is  equal  to  e~Xx\l/n(X),  and  the  equality  (A8.1.2)  itself 
can  also  be  obtained  as  a  consequence  of  the  well-known  Kolmogorov-Doob  in¬ 
equality  for  submartingales  (see  Theorem  15.3.4,  where  one  has  to  put  Xn  :=  Sn). 
Thus,  if  E§  >0  then 


P(Sn>x)<e-Xx+nln't'(X). 

Choosing  the  best  possible  value  of  A  we  obtain  the  following  inequality. 

Corollary  A8.1.1  7/’E§  >  0  then,  for  all  n  >  1  and  x  >  0,  we  have 

P(5„  >  x)  <e-nA{a\ 


where 


a  := 


n 


A(a)  :=  sup(Aa  —  ln^(A)) 

7 


The  function  A(a)  is  the  rate  function  introduced  in  Sect.  9.1.  Its  basic  proper¬ 
ties  were  stated  in  that  section.  In  particular,  for  E§  =0  and  E§2  =  a2  <  oo,  the 

2 

asymptotic  equivalence  A  (a)  ~  ^  as  a  ->  0  takes  place,  which  yields  that,  for 
x  —  o(n ), 


P (Sn  >X)<  exp] 


2  ncr2 


(1  +o(l))  .. 


(A8.1.3) 


8.2  Upper  Bounds  when  the  Cramer  Condition  Is  Not  Met 

In  this  section  we  will  assume  that 

E£  =  0,  E£2  =  a2  <  oo.  (A8.2.1) 

For  simplicity’s  sake,  without  losing  generality,  in  what  follows  we  will  put  a  —  1. 
The  bounds  will  be  obtained  for  the  deviation  zone  v  >  +Jn  which  is  adjacent  to  the 
zone  of  “normal  deviations”  where 


P (Sn  >X)~1-<P 


(A8.2.2) 
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(uniformly  in  i  e  (0,  Nn*Jn),  where  Nn  — >  oo  slowly  enough  as  n  — >►  oo;  see 
Sect.  8.2).  Moreover,  it  was  established  in  Sect.  19.1  that,  in  the  normal  deviations 
zone, 


P (Sn  >X)~2 


1  -  0 


(A8.2.3) 


To  derive  upper  bounds  in  the  zone  x  >  n  when  the  Cramer  condition  [C] 
is  not  met,  we  will  need  additional  conditions  on  the  behaviour  of  the  right  tail 
F+(t)  =  P(§  >  t)  of  the  distribution  F. 

Namely,  we  will  assume  that  the  following  condition  is  satisfied. 


[<]  For  the  right  tail  F+(t)  =  P(§  >  t)  there  exists  a  regularly  varying  ( see 
Appendix  6)  majorant  V (t): 

F+{t)  <V (t)  \=t~P L(t)  for  all  f  >  0, 

where  ft  >  2  and  L  is  a  slowly  varying  function  ( s.v.fi ,  see  Appendix  6). 


By  virtue  of  (A8.2.2)  and  (A8.2.3),  for  deviations  x  <  Nn^/n,  n  — >►  oo,  it  would 

be  natural  to  expect  upper  bounds  with  an  exponential  right-hand  side  e~x  ^2n) 
(cf.  (A8.1.3)).  On  the  other  hand,  Theorem  A6.4.3(iii)  of  Appendix  6  implies  that, 
for  F+(f)  =  V (t)  e  31  and  any  fixed  n  we  have,  as  x  ->  oo, 

P (Sn  >x)-nV(x).  (A8.2.4) 


This  relation  clearly  holds  true  if  n  — >  oo  slowly  enough  (as  x  — >  oo). 

The  asymptotics  (A8.2.2)  and  (A8.2.4)  merge  with  each  other  remarkably  as 
follows: 


P (Sn  >X)^ 


^1-0 


+  nV(x) 


(A8.2.5) 


as  n  — >►  oo  for  all  x  >  +Jn  (for  more  details  see,  e.g.,  [8]  and  the  bibliography 
therein).  Relation  (A8.2.5)  allows  us  to  “guess”  the  threshold  values  of  x  =  bin) 
for  which  asymptotics  (A8.2.2)  changes  to  asymptotics  (A8.2.4).  To  find  such  x  it 
suffices  to  equate  the  logarithms  of  the  right-hand  sides  of  (A8.2.2)  and  (A8.2.4): 


x2 

—  =  In nV(x)  =  In n  —  /3  \nx  +  o( Inx). 
2  n 


The  main  part  b(n)  of  the  solution  to  this  equation,  as  it  is  not  hard  to  see,  has  the 
form 

b(n)  =  yj (ft  —  2)n  In n 
(we  exclude  the  trivial  case  n  =  1). 

In  what  follows,  we  will  represent  deviations  lasi  =  sb(n).  Based  on  the  above, 
it  is  natural  to  expect  (and  it  can  be  easily  verified)  that  the  first  term  will  dominate 
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on  the  right-hand  side  of  (A8.2.5)  if  s  <  1,  while  the  second  will  dominate  if  s  >  1. 
Accordingly,  for  small  s  (but  such  that  v  >  «Jn),  we  will  have  the  above-mentioned 
exponential  bounds  for  P (Sn  >  x),  while  for  large  s  there  will  hold  bounds  of  the 
form  nV  (x)  (note  that  nV (x)  — >  0  for  v  >  b(n)  and  f>  >  2). 

The  above  claim  is  confirmed  by  the  assertions  below.  Along  with  v  introduce 
deviations 

y  =  -> 

r 


where  r  >  1  is  fixed,  and  put 


Bj  :=  {£/  <  y),  B  :=f)Bj. 

j= i 

Theorem  A8.2.1  Let  conditions  (A8.2.1)  and  [<]  be  satisfied. 

(1)  For  any  fixed  h  >  1,  so  >  0,  x  =  sb(n ),  s  >  so  and  all  TJ  :=  nV  (x)  small 
enough ,  we  have 

P:=P(Sn>x;B)<er(^fiL 

where 

hr 2  /  Ins  \ 
n(y)  :=nV(y),  e:=—(i+b—'), 

(2)  For  any  fixed  h  >  1,  r  >  0 ,/or  x  =  sb(n)  >  <Jn,  s2  <  (h  —  r)/2,  and  all  n 
large  enough ,  we  have 

p  <e-x2/(2nh).  (A8.2.7) 

Corollary  A8.2.1  (a)  If  s  — >  00  t/zozz 

P(S„  >x)  <  nVW(l  +o(l)).  (A8.2.8) 

(b)  If  V  >  s2  for  some  fixed  so  >  1  then,  for  all  nV  (x)  small  enough , 

P (Sn  >x)<cnV(x),  c  —  const.  (A8.2.9) 

(c)  For  any  fixed  h  >  1,  r  >  0,  /or  s2  <  (h  —  r)/2,  v  >  ozzd  o//  zi  /orgo 
enough , 

P(5„  >  x)  <  e~x2l(2nh).  (A8.2.10) 

Remark  A8.2. 1  It  is  not  hard  to  verify  (see  the  proofs  of  Theorem  A8.2. 1  and  Corol¬ 
lary  A8.2.1)  that  there  exists  a  function  e(t)  ^  0  as  t  \  oo  such  that  one  has,  along 


-0 


(A8.2.6) 


b  := 


ip 


-2 
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with  (A8.2.8),  the  relation 


P(NW  >  x) 

sup  —  <  1  +s(t). 

x:s>t  nV(x) 


Proof  of  Corollary  A8.2.1  The  proof  is  based  on  the  inequality 
P(S„  >  x)  <  P (B)  +  P(S„  >  jc;  B)  <  nV(y)  +  P. 


(A8.2.11) 


Since  0  — >►  0  as  s  — >►  oo,  we  see  that,  for  any  fixed  e  >  0  and  all  77  =  nV (x)  small 
enough,  we  have  P  <  c(nV (y))r~s .  Putting  r  :=  1  +  2s,  we  obtain  from  (A8.2.11) 
and  (A8.2.6)  that 

P(Sn  >  x)  <  nV (y)  +  c(nV (yf)lJr£  ~  n(  1  +  2e)_^y(v). 

Since  the  left-hand  side  of  this  inequality  does  not  depend  on  s,  relation  (A8.2.8) 
follows. 

We  now  prove  (b).  If  s  — >  oo  then  (b)  follows  from  (a).  If  s  is  bounded  then 
necessarily  n  — >  oo  (since  nV (v)  — >►  0)  and  hence 

hr 2  (  Ins  , 

r  —  0  =  r - —  (  1  +b  - —  )  =  f(r,  s)  +  o(  1), 


4  s2 


In  n 


where  the  function 


f(r,  s)  :=r  — 


hr 2 
4  s 2 


attains  its  maximum  f  (ro,  s)  =  s2/  h  in  r  at  the  point  ro  =  2s2/  h.  Moreover,  fir,  s) 
strictly  decreases  in  s.  Therefore,  for  ro  =  2s2/  h,  we  obtain 


f(r0,s)  =  — , 
h 


(A8.2.12) 


ro  —  0  > - b  o(l)  as  n  ->  oo. 

/z 


(A8.2.13) 


Choose  /z  so  close  to  1  and  r  >  0  so  small  that  h  +  r  <  Sq.  Putting  r  :=  ro,  for 
s2  >Sq  >h  +  r  and  as  /i  ->  oo,  we  get  from  (A8.2.6),  (A8.2.12)  and  (A8.2.13)  that 


P (Sn  >  x)  <  nV(y)  +  c(nV(y))lJrT/1  ^  nV  ^ ~  r^ nV{x). 

This  proves  (b). 

Relation  (c)  for  y  =  v  follows  from  the  inequality  (see  (A8.2.7)  and  (A8.2.11)) 


P (Sn  >x)  <  nV (x)  +  e 


-x2/(2nh) 


5 


(A8.2.14) 
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where,  for  s2  <  (h  —  r)/2, 


e~x  {2nh)  >  exp< 


(h  —  r)  (/3  —  2)nlnn 


2  nh 


>  n 


-G8-2)/4 


On  the  other  hand,  we  have  x  >  */n, 


n 


V(x)  <  nV(^n)  =  n~{P-2)/2L*(n), 


where  L*  is  a  s.v.f.  Therefore  the  second  term  dominates  on  the  right-hand  side 
of  (A8.2.14).  Slightly  changing  h  if  necessary,  we  obtain  (c).  Corollary  A8.2.1  is 
proved.  □ 


Remark  A8. 2. 2  One  can  see  from  the  proof  of  the  corollary  that  the  main  contribu¬ 
tion  to  the  bound  for  the  probability  P (Sn  >  x)  under  the  conditions  of  assertions 
(a)  and  (b)  comes  from  the  event  B  =  {maxj<„  >  y}  with  y  close  to  x,  so  that 
the  most  probable  trajectory  of  {S&}^=1  that  reaches  the  level  x  contains  at  least  one 
jump  %j  of  size  comparable  to  x. 


Proof  of  Theorem  A8.2.1  In  our  case,  the  Cramer  condition  [C]  is  not  met.  In  order 
to  use  Theorem  A8.1.1  in  such  a  situation,  we  introduce  “truncated”  random  vari¬ 
ables  with  distributions  that  coincide  with  the  conditional  distribution  of  §  given 
{§  <  y}  for  some  level  y  the  choice  of  which  will  be  at  our  disposal.  Namely,  we 

introduce  independent  identically  distributed  random  variables  § .  ,  j  =  1,2,..., 
with  the  distribution  function 


P(^-V)</)  =  P(?</|?<  v)  = 


P(g  <  0 
P($  <  y) 


,  t  <  y, 


and  put 


n 


(y)  t Ay)  . _ _  c(y) 


S„  ■■=  y]  %j  ,  Sn' '  :=  max  Sk 


7  =  1 


k<n 


Then 


P  =  P (Sn  >X,B)  =  (P($  <  y))''P(^-V)  >  X). 


(A8.2.15) 


Applying  Theorem  A8.1.1  to  the  variables  §  \y\  we  obtain  that,  for  any  A  >  0, 


P >x)<e  Xx  [max{  1 ,  E  e^(y) }]” . 


Since 


Eex^  =  g(Vy) 

F(y ) 


where  R(X,  y)  := 


eXt¥(dt), 


we  arrive  at  the  following  basic  inequality.  For  x,  y,  X  >  0, 


P  =  P(Sn  >x,  B)  <  e  lx[max{P(§  <  y),  tf(A,y)}]" 


8.2  Upper  Bounds  when  the  Cramer  Condition  Is  Not  Met 


709 


<e  Xx  maxjl,  Rn(X,  y)} 


(A8.2.16) 


Thus,  the  main  problem  is  to  bound  the  integral  R(X,  y).  Put 


M(v)  :=  - 


and  represent  R(X,  y)  as 


y)  —  h  +  h, 


where,  for  a  fixed  s  >  0, 


-oo 


M{s)  pM(s) 


rM{s)  / 

2—00  V 


I\  \=  I  eXtF (dt)=  1  +Xt  + - em)]F (dt),  0<  — <1. 


0(0 


(A8.2.17) 


Here 


•M(e) 


F (dt)  =  1  -  V(M(e))  <  1, 


Therefore, 


pM(s)  poo 

/  tF(dt)  =  -  /  tF(dt)  <  0, 

l—oo  J M{s) 

/. M(s ) 

t2F(dt)  <  e 

-OO 


•M(e) 


/t  <  1  + 


Xrh 


Estimate  now 


First  consider,  for  M(s)  <  M (2/3)  <  y,  the  subintegral 


pM{  2/3) 

h_.\  :=/-  /  V(t)ex,dt. 

J  M(s) 


For  t  =  u/A,  as  A  — >  0,  we  have 


ywe  =y u r~y u 


where  the  function 


(A8.2.18) 


(A8.2.19) 


(A8.2.20) 


I2  :=  —  f  ext  dF+(t)  <V(M(s))ee +X  f  V(t)eXt  dt.  (A8.2.21) 

JM(s)  JM(s) 


(A8.2.22) 


f(v)  :=v~Pe] 
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is  convex  on  (0,  oo).  Therefore 


h,\  <  \{M{ 2/J) - M(e))  V Q J  (/(e)  +  /(2/))(l  +  o(l»  <  c  V (A .  (A8.2.23) 


We  now  proceed  to  estimating  the  remaining  subintegral 


'y 


h,2  ■=  A.  f  V(t)ek,dt. 

JM{  2/3) 


For  brevity’s  sake,  put  M (2/3)  =:  M.  We  will  choose  A  so  that 

\x  —  Xy—>oo  (y  1/A.) 

as  v  ^  oo.  Substituting  the  variable  (y  —  t)X  =:  u  we  obtain 

r(y-M)k  /  M\ 

^h,2  =  eXyV(y)  J  v[y--\V-\y)e-udu, 


(A8.2.24) 


(A8.2.25) 


Consider  the  integral  on  the  right-hand  side  of  (A8.2.25).  Since  \/X  y,  the  inte¬ 
grand 

V(y-u/k) 


ry,\(u)  := 


V(y) 


converges  to  1  for  each  fixed  u .  In  order  to  use  the  dominated  convergence  theorem 
which  implies  that  the  integral  on  the  right-hand  side  of  (A8.2.25)  converges,  as 
y  — >  oo,  to 


L 


oo 


e  u  du  =  1, 


(A8.2.26) 


it  remains  to  estimate  the  growth  rate  of  the  function  ry^(u)  as  u  increases.  By  the 
properties  of  r.v.f.s  (see  Theorem  A6.2.  l(iii)  in  Appendix  6),  for  all  A  small  enough 
(or  M  large  enough;  recall  that  y  —  u/X  >  M  in  the  integrand  in  (A8.2.25)),  we  have 


ry,k(u)  <  1  - 


u 


Xy 


-2)0/2 


=:  g(u). 


Since  g(0)  =  1  and  Xy  —  u  >  MX  =  2/3,  in  this  domain 


(In  g(u))'  = 


In g(u)  < 


3  0 


3/3  _  3 
2 (Xy-u)  -4/3-4' 

3  u 


ry,\iu)  <  e 


3w/4 


This  means  that  the  integrand  in  (A8.2.25)  is  dominated  by  the  exponential  e  M//4, 
and  the  use  of  the  dominated  convergence  theorem  is  justified.  Therefore,  due  to  the 
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convergence  of  the  integral  in  (A8.2.25)  to  the  limit  (A8.2.26),  we  obtain 


»oo 


r 

XI2, 2^elyV(y)  eu  du  =  eflV(y), 

Jo 


and  it  is  not  hard  to  find  a  function  £(/x)  /  0  as  /x  t  00  such  that 

XI2,2<eXyV(y)(l+s(n)). 

Summarising  (A8.2.20)-(A8.2.23)  and  (A8.2.27),  we  obtain 


Rn(^,y)  <  exp 


nX2h 


+  cwV^J  +nV(y)eky(  1  +  £(/x)) 


(A8.2.27) 


«(Aj)<  l  +  ^+cv(^j  +  Viy)eky(l+s(p)),  (A8.2.28) 


.  (A8.2.29) 


First  take  A  to  be  the  value 

1 

X  =  -  In  T 

y 

that  “almost  minimises”  the  function  —  Xx  +  nV(y)eky,  where  T  :=  ny^y) ,  so 

that  i±  =  ln7\  Note  that,  for  such  a  choice  of  /x  (or  of  A  =  y_1  ln(r//7(y))),  for 
77  (y)  — >  0  we  have  that  /x  =  Ay  ~  —  ln/7(y)  — >  00  and  hence  that  the  assumption 
y  »  1/A  we  made  in  (A8.2.24)  holds  true.  For  such  A, 


Rn&,y)  <  exp 


nX2h  /lv  ,  >. 

— v f  -  )  +  r(l  +  s(p,)) 


(A8.2.30) 


where,  by  the  properties  of  r.v.f.s, 


nV I  - 
/x 

5  >0, 


a 


In  T 


cnV 


y 


\nnV(y)\ 


<  cnV (y)\\nnV (y) 


>0, 

(A8.2.31) 


as  nV (y)  — >  0.  Therefore 


In  P  <  —  r  In  T  +  r  H - «  In2  T  +  £1  (T) 


2y^ 


nh 

-r  +  — r  In  T 

2  y2 


In  T  +  r  +  ei(r), 


(A8.2.32) 


where  £i(T)  /  0  as  T  f  00.  If  jc  =  sb(n ),  /?(ft)  =  \/(P  —  2  )n  \nn ,  and  «V(x)  ^  0 
then 

lnT  =  —  lnnU(x)  +  0(1)  =  —  km  +  /llns  +  —  km  +  O  (in  L^cr^)))  +  0(1) 
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-2 


In  n 


1  +b 


Ins 
In  n 


(l  +0(1)), 


(A8.2.33) 


where  b  =  j ^  (the  term  o(l)  in  the  last  equality  appears  because  in  our  case  either 
n  — >  oo  or  s  — >  oo.)  Hence,  by  (A8.2.32), 


nh 
2  y2 


In  T  = 


hr 


2  r 


4s2 


1  +b 


In  P  <  r  — 


r  — 


Ins 
In  n 

h'r 2 
4  s2 


(l  +o(l)), 


1  +b 


Ins 
In  n 


In  T 


for  any  ti  >h>  1  and  nV  (x)  small  enough.  This  proves  the  first  assertion  of  the 
theorem. 

We  now  prove  the  second  assertion  of  the  theorem  for  “small”  values  of  s  such 
that,  for  some  r  >  0, 

2  h-x 


Since  we  always  assume  that  v  >  we  also  have 

v  1 

b(n)  ^(/3  —  2)  Inn 

and  we  can  assume  that  s1  >  n~  ~y  for  some  y  >  0  to  be  chosen  below.  This  corre¬ 
sponds  to  the  following  domain  of  the  values  of  x2: 

cnl~y  \nn  <  x2  <  - - - - n\nn .  (A8.2.34) 

For  such  v,  as  will  be  shown  below,  the  main  contribution  to  the  exponent  on  the 
right-hand  side  of  (A8.2.29)  comes  from  the  quadratic  term  nX2h/ 2,  and  we  will  set 

v 

a  :  =  — . 
nh 


Then,  for  y  =  v  (r  =  1,  /i  =  x2 /(nh)). 


In  P  <  —Xx  + 


nX2h 


+  +  nV  (y)eXy  ( 1  +e(/x)) 


+  cnv(  —  )  -\-  nV (x)enh  (l  +  £(/z)) 


2  nh 


(A8.2.35) 


We  show  that  the  last  two  terms  on  the  right-hand  side  are  negligibly  small  as 
n  — >  oo.  Indeed,  by  the  second  inequality  in  (A8.2.34), 


nV 


as  n 


oo. 
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Further,  by  the  first  inequality  in  (A8.2.34), 

nV{x)<n{2~fS),2+v' , 

where  we  can  choose  yf.  Moreover,  by  (A8.2.34), 


x2  (h  —  t)(/3  —  2)  \nn 


nh 


< 


2  h 


-2  r(/3  —  2) 


2  h 


In  n, 


Therefore 

nV(x)ex2/inh)  <n-Tif>-2y(2h)+y'  0 

for  y'  <  r(2 r2)  as  «  — >•  oo. 

Thus, 


x2 

In  P  < - b  o(l). 

“  2n/z 

Since  v2/n.  >  1,  the  term  o(  1)  in  the  last  relation  can  be  omitted  by  slightly  chang¬ 
ing  h  >  1 .  (Formally,  we  proved  that,  for  any  h  >  1  and  all  n  large  enough,  inequal¬ 
ity  (A8.2.7)  is  valid  with  the  h  on  its  right-hand  side  replaced  with  hf  >  h,  where  we 
can  take,  for  instance,  h'  =  h  +  (h  —  l)/2.  Since  h'  >  1  can  also  be  made  arbitrarily 
close  to  1  by  the  choice  of  h ,  the  obtained  relation  is  equivalent  to  the  one  from 
Theorem  A8.2.1.)  This  proves  (A8.2.7). 

The  theorem  is  proved.  □ 


Comparing  the  assertions  of  Theorem  A8.2.1  and  Corollary  A8.1.1,  we  see  that, 
roughly  speaking,  for  s  <  1  /2  and  for  s  >  1  one  can  obtain  quite  satisfactory  and,  in 
a  certain  sense,  unimprovable  upper  bounds  for  the  probabilities  P  and  P (Sn  >  x). 


8.3  Lower  Bounds 


In  this  section  we  will  again  assume  that  conditions  (A8.2.1)  are  satisfied.  The  lower 
bounds  for  P (Sn  >  x)  (they  will  clearly  hold  for  P(Sn  >  x)  as  well)  can  be  obtained 
in  a  much  simpler  way  than  the  upper  bounds  and  need  essentially  no  assumptions. 


Theorem  A8.3.1  Let  E  =0  and  E§2  =  1.  Then,  for  y  =  v  +  t\Jn  —  1, 

j  j 


Y(Sn  >x)  >nF+(y) 


1  -t 


-2 


(A8.3.1) 


Proof  Put  Gn  :=  {Sn  >  x}  and  Bj  :=  {^j  <  y}.  Then 


n 


n 


P(S„>x)>P\Gn;  Mfij  >Lp(M;)-  T  p (GnBiBj) 
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n 


>y>  (GnBj)- 


j= i 


n(n-  1)  2,  , 

- O -  F+W- 


Here,  for  y  =  x  +  ty/n  —  1 


noo 

P (GnBj)=  /  P(S„_i  >x  -u)F(du)  >P(Sn-i  >x  -y)F+(y) 

Jy 

=  P(S„_i  >  -tVn^l)F+(y)  =  (1  -  P(S„_i  <  -tyfr=T))F+{t), 
where,  by  the  Chebyshev  inequality, 

P(Sn-\  <  —t\fn  —  1 )  <  t~2. 


As  a  result  we  get 

P(Sn  >x)>  nF+(t)(  1  -  r2)  -  n(n~  0  Fl(t), 
which  is  equivalent  to  (A8.3.1). 

The  theorem  is  proved.  □ 

Corollary  A8.3.1  If  x  — >  oo  and  x  x fn  then ,  as  t  — >  oo, 

P(^  >  jc)  >  wF+(y)(l  +  o(l)).  (A8.3.2) 

If  moreover. ;  T+(w)  >  V(m)  g  $ 

P(^  >  x)  >«y(i)(l  +  o(l)). 

Proof  Since  y  >  v ,  we  have 

n,F+(y)  <  ny~ 2  <  nv-2  =  o(l). 

This  together  with  (A8.3.1)  implies  the  first  assertion  of  the  corollary  as  t  — >  oo.  To 
obtain  the  second  one,  in  (A8.3.2)  one  should  take  t  — >►  oo  such  that  £  =  o(x/^/n). 
Then  y  ~  v  and  V (y)  ~  V (v). 

The  corollary  is  proved.  □ 


Appendix  9 

Renewal  Theorems 


The  main  goal  of  the  present  section  is  to  prove  Theorem  10.4.1,  the  key  renewal 
theorem  in  the  non-arithmetic  case  (in  the  terminology  of  Chap.  10).  We  will  also 
consider  some  refinements  and  extensions  of  the  theorem. 

First  consider  positive  independent  identically  distributed  random  variables 

Tj  =  r  with  distribution  function  F  and  finite  mean  a  :=  Er  <  oo.  Here  it  will  be 
more  convenient  to  understand  by  the  renewal  function  its  left-continuous  version 


t  >  0, 

k= 0 


where  F*k  is  the  k-fold  convolution  of  the  distribution  F  with  itself,  which  is  the 

distribution  function  of  the  sum  7^  =  x\  4 - 1-  r&.  We  first  prove  the  following  key 

assertion. 

Theorem  A9.1  If  g  is  a  directly  integrable  function  and  Xj  are  non- arithmetic  ( see 
Chap.  10)  then ,  as  t  —>  oo, 

C  1 

/  g(t  —  u)  dH(u)  — >  -  /  g(u)du. 

Jo  a  Jo 

The  proof  of  the  theorem  mostly  follows  the  argument  suggested  in  [13]  and  will 
need  several  auxiliary  assertions. 

Lemma  A9.1  Let  g  be  a  bounded  measurable  function.  The  integral 

G(t)  =  (  g(t  —  u)  dH(u)  =:  g  *  H(t)  (A9.1) 

Jo 

is  the  unique  solution  of  the  equation 
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G{t)  =  g(t)  +  f  G(t  —  u)  dF(u)  =g(t)  +  G*  F(t )  (A9.2) 

Jo 

in  the  class  of  functions  bounded  on  finite  intervals. 

The  function  G  =  H  is  the  solution  of  (A9.2)  when  g  =  1.  The  function  G  =  l  is 
the  solution  of  (A9.2)  when  g  =  1  —  F. 

Equation  (A9.2)  is  called  the  renewal  equation. 

As  we  already  noted  in  Theorem  10.4.1,  one  can  associate,  in  an  obvious  way, 
measures  H  and  F  with  the  functions  H  and  F ,  and  write  the  integrals  in  (A9.1)  and 
(A9.2)  as  integrals  with  respect  to  the  measures: 

f  g(t  —  u)H(du)  and  f  G(t  —  u)F(du), 

Jo  Jo 


respectively. 

Proof  of  Lemma  A9.1  Put 

n 

H„(t)  :=y>*ho. 

k=0 

The  functions  Gn  =  g  *  Hn  satisfy  the  equation  G^+i  =  g  +  Gn  *  F  and  form  an 
increasing  sequence  Gn  \  which  is  bounded  by  Lemma  10.2.3.  Therefore  Gn  \  G, 
and  passing  to  the  limit  in  the  equation  for  Gn  we  obtain  that  G  satisfies  (A9.1). 
To  prove  uniqueness  note  that  the  difference  V  =  G(1)  —  G(2)  of  two  solutions  G(1) 
and  G^2)  must  satisfy  the  homogeneous  equation  V  =  V  *  F  and  therefore  also  the 
relations  V  =  V  *  (Fk*)  or,  which  is  the  same, 

V(t)=  f  V(t  -u)dF*k(u). 

Jo 

But  F*k(u)  — >  0  as  k  — >  oo  for  u  e  [0,  t].  Since  by  the  assumption  \  V(u)\  <  c  on 
[0,  t],  we  have  V (t)  — >  0  as  k  — >  oo.  But  V  does  not  depend  on  k ,  so  that  V ( t )  =  0. 
The  last  assertion  of  the  lemma  can  be  verified  directly.  The  lemma  is  proved.  □ 

Note  that  if  we  considered  functions  g  of  bounded  variation,  the  assertion  of 
Lemma  A9.1  would  immediately  follow  from  the  equation  for  the  Laplace-Stieltjes 
transform  G(A)  =  /0°°  e~kt  dG(t)  of  G  which  follows  from  (A9.2): 

G(k)=g(k)  +  G(k)xlr(k),  (A9.3) 


poo  poo 

g(X):=  e~Xt  dg(t),  tjr(k)  :=  /  e~kt  dF(t). 

Jo  Jo 


where 
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Indeed,  it  follows  from  (A9.3)  that 


GW  = 


gfr) 


which  is  equivalent  to  (A9.1). 

A  point  t  is  said  to  be  a  point  of  growth  of  the  distribution  function  F  provided 
that  F(t  +  s)  —  F(t)  >  0  for  any  e  >  0. 


Lemma  A9.2  Let  the  distribution  F  be  non- arithmetic  and  Z  be  the  set  of  all  points 

of  growth  of  H,  i.e.  points  of  growth  of  the  functions  F,  F*2,  F*3, _ Then  Z  is 

u asymptotically  dense  at  infinity  ”,  i.e.,  for  any  given  £  >  0  and  all  x  large  enough , 
the  intersection  (x,x  +  s)  (2  Z  is  non-empty. 


Proof  Observe  first  that  if  t\  is  a  point  of  growth  of  the  distribution  F\  of  a  random 
variable  r ,  and  t2  is  a  point  of  growth  of  the  distribution  F2  of  a  random  variable  f 
which  is  independent  of  r ,  then  t  =  t\  +  t2  will  be  a  point  of  growth  of  the  distribu¬ 
tion  F\  *  F2  of  the  variable  r  +  f .  Indeed, 


P(*  <  r  +  f  <  t  +  e)  >  P  [t\  <  r  <  t\  +  -  P  \t2  <  f  <  t2  +  - 


Let,  further,  v  <  y  be  two  points  of  the  set  Z,  and  A  :=  y  —  x.  The  following 
alternative  takes  place:  either 

(1)  for  any  e  >  0  there  exist  x  and  y  such  that  A  <  s,  or 

(2)  there  exists  a  8  >  0  such  that  A  >  8  for  all  x  and  y  from  Z. 

Put  In  :=  [xn,yn].  If  nA  >  x  then  that  interval  contains  [nx,  (n  +  l)v]  as  a 
subset,  and  therefore  any  point  v  >  vo  =  x2/ A  belongs  to  at  least  one  of  the  intervals 

h,h,  •  •  •  • 

By  virtue  of  the  above  observation,  the  n  + 1  points  nx  +  kA  =  (n  —  k)x  -\-ky,k  = 
0 , ...  ,n,  belong  to  Z  and  divide  In  into  n  subintervals  of  length  A.  This  means  that, 
for  any  point  v  >  vq,  the  distance  between  v  and  the  points  from  Z  is  at  most  A/ 2. 

This  implies  the  assertion  of  the  lemma  when  (1)  holds. 

If  (2)  is  true,  we  can  assume  that  v  and  y  are  chosen  so  that  A  <28.  Then  the 
points  of  the  form  nx  +  kA  exhaust  all  the  points  from  Z  lying  inside  In.  Since  the 
point  (n  +  l)x  is  among  these  points,  the  value  1  is  a  multiple  of  A,  and  all  the 
points  of  Z  lying  inside  In  are  multiples  of  A.  Now  let  z  be  an  arbitrary  point  of 
growth  of  F.  For  sufficiently  large  n,  the  interval  In  contains  a  point  of  the  form 
z  +  kA,  and  since  the  latter  belongs  to  Z,  the  value  z  is  also  a  multiple  of  A.  Thus 
F  is  an  arithmetic  distribution,  so  that  case  (2)  cannot  take  place.  The  lemma  is 
proved.  □ 


Lemma  A9.3  Let  q(x)  be  a  bounded  uniformly  continuous  function  given  on 
(—00,  00)  such  that,  for  all  x,  q{x)  <  q(0)  for  all  x,  and 


q(x)  = 


q(x  -  y)dF(y). 


(A9.4) 


Then  q(x)  =  q( 0). 
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Proof  Equation  (A9.4)  means  that  q  =  q  *  F  =  ••  •  =  q  *  F*k  for  all  k  >  1.  The 
right-hand  side  of  (A9.4)  does  not  exceed  q(Q),  and  hence,  for  x  =  0,  the  equality 
(A9.4)  is  only  possible  if  q(—y)  =  q( 0)  for  all  y  e  Z&,  where  is  the  set  of  points 
of  growth  of  F*k ,  and  therefore  q(—y)=q(0)  for  all  y  e  Z.  By  Lemma  A9.2  and 
the  uniform  continuity  of  q  this  means  that  q(—y)  — >  q( 0)  as  y  — >  oo.  Further, 
for  an  arbitrarily  large  N  we  can  choose  k  such  that  q(x)  will  be  arbitrarily  close 
to  q(x  —  y)  dF*k (y),  since  F*k(N)  — >  0  as  k  — >  oo.  This  means,  in  turn,  that 
#(jc)  will  be  close  to  g(0).  Since  #(jc)  depends  neither  on  A  nor  on  k,  we  have 
q (x)  =  q (0) .  The  lemma  is  proved.  □ 

Lemma  A9.4  Let  g  be  a  continuous  function  vanishing  outside  segment  [0,  b].  Then 
the  solution  G  of  the  renewal  equation  (A9.2)  is  uniformly  continuous  and ,  for 
any  u , 


G(x  +  u)  —  G(x)  — >  0 


(A9.5) 


as  x 


00. 


Proof  By  virtue  of  Lemma  10.2.3, 


G(x  +  8)-G(x) 


rx-\ S 

/  ( g(x  +  8-y)-g(x-y))dH(y ) 

J  x—b 


<  max 

0<x</?+5 


g(x  +  5)  -^(x)|(ci  +C2(b+S)).  (A9.6) 


This  means  that  the  uniform  continuity  of  g  implies  that  of  G. 

Now  assume  that  g  has  a  continuous  derivative  g' .  Then  G'  exists  and  satisfies 
the  renewal  equation 


G\x)  =  g\x)  +  f  G\x  -  y)  dF(y). 

Jo 

Therefore  the  derivative  G'  is  bounded  and  uniformly  continuous.  Let 

limsup  G\x)  =  s. 

x^-oo 

Choose  a  sequence  tn  — >  oo  such  that  G'(tn)  s.  The  family  of  functions  qn  de¬ 
fined  by  the  equalities 

qn(x)  =  G'(tn  +x) 


is  equicontinuous,  and 


px-\~tn  p  oo 

qn(x)  =  g' (tn  +  x)  +  /  qn(x-y)dF(y)  =  g\tn+x)+  /  qn(x  -  y)dF(y). 

Jo  Jo 

(A9.7) 
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By  the  Arzela-Ascoli  theorem  (see  Appendix  4)  there  exists  a  subsequence  tHr 
such  that  qUr  converges  to  a  limit  q .  From  (A9.7)  it  follows  that  this  limit  satis¬ 
fies  the  conditions  of  Lemma  A9.3,  and  therefore  q(x)  =  g(0)  =  s  for  all  v.  Thus 
G'(tnr+x)  — >  s  for  all  x,  and  hence 


G{tnr  Tv)  G{tnr}  — >  sx. 


Since  the  last  relation  holds  for  any  x  and  the  function  g  is  bounded,  we  get  5  =  0. 

We  have  proved  the  lemma  for  continuously  differentiable  g.  But  an  arbitrary 
continuous  function  g  vanishing  outside  [0,  b]  can  be  approximated  by  a  continu¬ 
ously  differentiable  function  g\  which  also  vanishes  outside  that  interval.  Let  G\ 
be  the  solution  of  the  renewal  equation  corresponding  to  the  function  g\ .  Then 
\g  —  gi|  <  s  implies  \  G  —  G\  \  <  cs,  c  =  c\  -\-c2b  (see  Lemma  10.2.3),  and  therefore 


G(v  T  u) 


G(x) 


<c  (2c  T  l)c 


for  all  sufficiently  large  x.  This  proves  (A9.5)  for  arbitrary  continuous  functions  g. 
The  lemma  is  proved.  □ 


Proof  of  Theorem  A9.1  Consider  an  arbitrary  sequence  tn  — >  00  and  the  measures 
fin  generated  by  the  functions 

H(„){u)  =  H{tn  +u)~  H{tn)  (fin([u,  v))  =  H{n)( v)  -  H(n)(u)). 


These  functions  satisfy  the  conditions  of  the  generalised  Helly  theorem  (see  Ap¬ 
pendix  4).  Therefore  there  exists  a  subsequence  tnn,  the  respective  subsequence  of 
measures  itnn ,  and  the  limiting  measure  il  such  that  finn  converges  weakly  to  /x  on 
any  finite  interval  as  n  — >  00. 

Now  let  g  be  a  continuous  function  vanishing  outside  [0,  b].  Then 


G(tnn  +x)=  /  g(—u)  dH  (tnn  +x  +  u) 


-L 

=  /  g(—u)d(H(tnn+x  +  u) 
J-b 


fb 

H (tnn))  — >  /  g(u)/l(x  +du). 
JO 


By  Lemma  A9.4,  the  sequence  G(tnn  T  y)  will  have  the  same  limit.  This  means  that 
the  measure  fi(x  T du)  does  not  depend  on  v,  and  therefore  is  proportional 

to  the  length  of  the  interval  (u,v): 


/i((w,  u))  =  c(v  —  u ), 


ix(du)  =  cdu. 


Thus,  we  have  proved  that 


G(tnn  Tv) 


C 


00 


g(u)  du 


(A9.8) 
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for  any  continuous  function  g  vanishing  outside  [0,  b].  But  for  any  Riemann  inte¬ 
grate  function  g  on  [0,  b ]  and  given  s  >  0  there  exist  continuous  functions  gi  and 
g2,  gi  <  g  <  gi,  which  are  equal  to  0  outside  [0  ,b  +  1]  and  such  that 

f  ( g2-  gi)du  <e . 

Jo 

This  means  that  convergence  (A9.8)  also  holds  for  any  Riemann  integrable  function 
vanishing  outside  [0,  b]. 

Now  consider  an  arbitrary  directly  integrable  function  g.  By  property  (2)  of  such 
functions  (see  Definition  10.4.1)  one  can  choose  a  b  >  0  such  that  for  the  function 


g(b)(u)  =  • 


g(u) 

0 


if  u  <  b, 
if  u  >  b, 


the  left-  and  right-hand  sides  of  (A9.8)  will  be  arbitrarily  close  to  the  respective 
expressions  corresponding  to  the  original  function  g  (for  the  right-hand  side  it  is 
obvious,  while  for  the  left-hand  side  it  follows  from  the  convergence 


g(t  -s)dH(s) 


J  g(b)(t  -s)dH(s) 


r'-b 

Jo 


g(t  —  s)dH(s) 


<  (ci + c2)gk 


0 


k>b—  1 


as  b  — >  oo  (see  Lemma  10.2.3)).  Therefore  (A9.8)  is  proved  for  any  directly  inte¬ 
grable  function  g.  Putting  g  :=  1  —  F  we  obtain  from  Lemma  A9.1 


1  =c 


(1  -  F(u)) 


du  ~  ac , 


1 


Thus  the  limit  in  (A9.8)  is  one  and  the  same  for  any  initial  sequence  tn.  From  this  it 
follows  that,  as  t  — >►  oo, 

1 

G(t )  — >  —  /  g(u)  du. 

a  Jo 

The  theorem  is  proved.  □ 


Theorem  10.4.1  is  a  simple  consequence  of  Theorem  A9.1  and  the  argument 
used  in  the  proof  of  Theorem  10.2.3  that  extends  the  key  renewal  theorem  in  the 
arithmetic  case  was  extended  to  the  setting  where  zj,  j  >  2,  can  assume  values 
of  different  signs,  while  r\  is  arbitrary.  We  will  leave  it  to  the  reader  to  apply  the 
argument  in  the  non-arithmetic  case. 

Now  we  will  give  several  further  consequences  of  Theorem  A9.1.  In  Sect.  10.4 
we  obtained  a  refinement  of  the  renewal  theorem  in  the  case  when  m2  := 
E rj  <  00.  Approaches  developed  while  proving  Theorem  A9.1  enable  one  to  obtain 
an  alternative  proof  of  the  following  assertion  coinciding  with  Theorem  10.4.4. 
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Theorem  A9.2  Let  the  conditions  of  Theorem  A9.1  be  met  and  m2  <  00.  Then 

t  m2 

0  <  H(t) - >  — «  as  t  — >  00. 

a  2  az 

Proof  The  function  G(t)  :=  H(t)  —  t la  is  the  solution  of  the  renewal  equation 
(A9.2)  corresponding  to  the  function 


git)  ■-  - 
a 


1  r°° 

A, 


du, 


Since  g  is  directly  integrable,  we  have 


G(t)  ->  - 

a  Jo 


j  poo  poo 

-I  I  (l  —  F(u ))  dudv 

^  Jo  J  v 


m2 
2  a2 


The  theorem  is  proved. 


□ 


Theorem  A9.3  (The  local  renewal  theorem  for  densities)  Assume  that  F  has  a  den¬ 
sity  f  =  F'  and  this  density  is  directly  integrable.  Then  H  has  a  density  h  =  H\ 
and 

1 

h(t)  — >  —  as  t  00. 
a 

Proof  Denote  by  fn(x)  the  density  of  the  sum  Tn  =  r H - +  Tn.  We  have 

00  „ 

hit)  =  H\t)  =  y^fn(t)  =  f(t )  +  /  h(t  -u)f(u)du  =  f(t)  +  h*  F(t). 

n=i  ^ 

This  means  that  h(t)  satisfies  the  renewal  equation  with  the  function  g  =  /.  There¬ 
fore  by  Theorem  A9.1, 


h(t)  ->  - 
a 


A 

a  Jo 


00  | 

f(u)  du  =  — 
a 


The  theorem  is  proved.  □ 

Consider  now  some  extensions  of  Theorem  A9.1.  A  function  g  given  on  the 
whole  line  (—00,  00)  is  said  to  be  directly  integrable  if  both  functions  g(t)  and 
g(— t),  t  >0,  are  directly  integrable. 


Theorem  A9.4  If  the  conditions  of  Theorem  A9.1  are  met  and  g  is  directly  inte¬ 
grable ,  then 


L 


00 


G(t)  =  I  g(t  —  u)H(du) 


-  f 

a  J- 


g(u)  du  as  t  — >  00, 


00 
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The  Proof  can  be  obtained  by  making  several  small  and  quite  obvious  modifica¬ 
tions  to  the  argument  in  the  demonstration  of  Theorem  A9.1.  The  main  change  is 
that  instead  of  functions  g  vanishing  outside  [0,  b ]  one  should  now  consider  func¬ 
tions  vanishing  outside  [—b,b]. 

Another  extension  refers  to  the  second  version  of  the  renewal  function 

oo 

U ( t )  :=  F*k(t),  — oo  <  t  <  oo, 

k=0 

in  the  case  when  xj  can  assume  values  of  different  signs. 

Theorem  A9.5  If  g  is  directly  integrable  and  E xj  =  a  >  0,  then 

/OO  j  P  oc 

g(t  —  u)\J(du)  —>  —  I  g(u)du  as  t  oo, 

-oo  ^  J — oo 

and,  for  any  fixed  u,  U(t  +  u)  —  U(t)  —>  0  as  t  —>  oo. 

The  proof  is  also  obtained  by  modifying  the  argument  proving  Theorem  A9.1 
(see  [13]). 
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Index  of  Basic  Notation 


Spaces  and  a -algebras 

$ — a  o -algebra,  14 

(f2,  #) — a  measurable  space,  14 

R — the  real  line,  17 

^ -dimensional  Euclidean  space,  18 
93 — the  a -algebra  of  Borel-measurable  subsets  of  R,  17 
— the  cr -algebra  of  Borel-measurable  subsets  of  R" ,  18 

(f ,  P) — the  probability  space,  17 

(Note  that  Q  and  $  can  take  specific  values,  i.e.  R  and  03,  respectively.) 
Distributions 1 

F^,  F — the  distribution  of  the  random  variable  §,  32,  32 

la — the  degenerate  distribution  (concentrated  at  the  point  a ),  37 

Ufl}& — the  uniform  distribution  on  \a,b\,  37 

B p9  B” — the  binomial  distributions,  37 

multinomial  distributions,  47 

4>a  a2 — the  normal  (Gaussian)  distribution  with  parameters  ( a ,  a2),  37,  48 

(pa  a2  (v) — the  density  of  the  normal  law  with  parameters  (a,  cr2),  41 

F^p — the  stable  distribution  with  parameters  /3,  p,  231,  233 

/(^,p)  (v) — the  density  of  the  stable  distribution  with  parameters  F^p ,  235 

cptP’ri  ( t ) — the  characteristic  function  of  distribution  F^p ,  231 

K Ut(T — the  Cauchy  distribution  with  parameters  ( a ,  a ),  38 

Ta — the  exponential  distribution  with  parameter  a ,  38,  111 

Ta,x — the  gamma-distribution  with  parameters  (of,  A),  176 

Ux — the  Poisson  distribution  with  parameter  A,  39 

X2 — the  x  2 -distribution,  177 

A  (a) — the  large  deviation  rate  function,  244 


^All  distributions  and  measures  are  denoted  by  bold  letters). 
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Index  of  Basic  Notation 


Relations 

:=  means  that  the  left-hand  side  is  defined  by  the  right-hand  side,  xi 

=:  means  that  the  right-hand  side  is  defined  by  the  left-hand  side,  xi 

~  notation  an  ~  bn  (a(x)  ~  /?(x))  means  that  lim„^oo  jf-  =  1  (lim^oo  =  1), 
109 

P 

-* — convergence  of  random  variables  in  probability,  129 
— > — almost  sure  convergence  of  random  variables,  130 

(r) 

— > — convergence  of  random  variables  in  the  mean,  132 

= — notation  §  =  rj  means  that  the  distributions  of  §  and  ij  coincide,  144 

< — relation  §  <  y  means  that  P(§  >t)<  P (y  >  t)  for  all  t ,  302 

d 

> — relation  §  >  r\  means  that  P(§  >  t)  >  P(r]  >  t)  for  all  t ,  302 

d 

€= — notation  €=  F  means  that  §  has  the  distribution  F,  36 

F  means  that  the  distribution  of  converges  weakly  to  F,  144 
=> — relation  ¥n  =>►  F  means  weak  convergence  of  the  distributions  Fw  to  F,  141, 
for  random  variables  =>>  §  means  that  Fn  =>-  F,  where  §„ €=  Fn ,  §  €=  F,  143 

Conditions 

[C] — the  Cramer  condition,  240 

[%P]-  -conditions  of  convergence  to  the  stable  law  Fp,p,  229 


Subject  Index 


A 

Abelian  theorem,  673 

Absolutely  continuous  distribution,  40 

Absorbing  state,  393 

Absorption,  391 

Algebra,  14 

Almost  invariant 

random  variable,  498 
set,  497 

Amount  of  information,  448 
Aperiodic  Markov  chain,  419 
Arithmetic  distribution,  40 
Arzela-Ascoli  theorem,  657 
Asymptotically  normal  sequence,  187 
Atom,  419 
positive,  420 

B 

Basic  coding  theorem,  455 
Bayes  formula,  27 

Bernoulli  scheme,  local  limit  theorems  for,  113 

Bernstein  polynomial,  109 

Berry-Esseen  theorem,  659 

Beta  distribution,  179 

Binomial  distribution,  37 

Bochner-Khinchin  theorem,  158 

Borel 

a -algebra,  15 
set,  15 

Branching  process,  180,  591 
extinction  of,  1 82 
Brownian  motion  process,  549 

C 

Caratheodory  theorem,  19,  622 
Cauchy  sequence,  132 
Cauchy-Bunjakovsky  inequality,  87,  97 


Central  limit  theorem,  187 
for  renewal  processes,  299 
Central  moment,  87 
Chain,  Markov,  390,  414 
Chapman-Kolmogorov  equation,  582 
Characteristic  function,  153 

for  multivariate  distribution,  171 
Chebyshev  inequality,  89,  96 
exponential,  248 
Chi-squared  distribution,  177 
Class 

of  distributions 
exponential,  373 
superexponential,  373 
of  functions,  distribution  determining,  148 
Coefficient 

diffusion,  604 
shift,  604 

Common  probability  space  method,  118 
Communicating  states,  392 
Complement,  16 
Completion  of  measure,  624 
Component,  factorisation,  334 
Compound  Poisson  process,  552 
Condition 

Cramer,  240,  703 
Cramer  on  eh. f.,  217 
[Di],  188 
0 D2 ),  199 

Lyapunov,  202,  560 
[%p],229,  687 
Conditional 
density,  100 
distribution,  99 
distribution  function,  70 
entropy,  45 1 

expectation,  70,  92,  94,  95 
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Subject  Index 


Conditional  ( cont .) 

probability,  22,  95 
Consistent  distributions,  530 
Continuity  axiom,  16 
Continuity  theorem,  134,  167,  173 
Converge 

in  measure,  630 
Convergence 

almost  everywhere,  630 
almost  surely,  with  probability  1,  130 
in  distribution,  143 
in  measure,  630 
in  probability,  129 
in  the  mean,  132 
in  total  variation,  653 
weak,  141,  173,649 
Correlation  coefficient,  86 
Coupling  method,  118 
Covariance  function,  611 
Cramer 

condition,  240,  703 
on  ch.f.,  217 
range,  256 
series,  248 
transform,  473 
Crossing  times,  237 
Cumulant,  242 
Cylinder,  528 

D 

Defect,  290 

Degenerate  distribution,  37 

De  Moivre-Laplace  theorem,  115,  124 

Density 

conditional,  100 
of  distribution,  40 
of  measure,  642 
transition,  583 

Derivative,  Radon-Nikodym,  644 
Deviation,  standard,  83 
Diffusion 

coefficient,  604 
process,  603 

Directly  integrable  function,  293 
Distance,  total  variation,  420 
Distribution,  17 

absolutely  continuous,  40 
arithmetic,  40 
beta,  179 
binomial,  37 
chi-squared,  177 
conditional,  99 
consistent,  530 
degenerate,  37 


Erlang,  177 

exponential,  38,  71,  177 
finite-dimensional,  528 
function,  32 

conditional,  70 
properties,  33 
gamma,  176 
Gaussian,  37 
geometric,  38 
infinitely  divisible,  539 
invariant,  404,  419 
lattice,  40 
Levy,  235 
multinomial,  47 

multivariate  normal  (Gaussian),  48,  173 
non-lattice,  160 
normal,  37 
of  process,  528 
of  random  process,  529 
of  random  variable,  32 
Poisson,  26,  39 
singular,  41,  325 
stable,  233 
stationary,  404,  419 
of  waiting  time,  350 
subexponential,  376,  675 
tail  of,  228 
uniform,  18,  37,  325 
uniform  on  a  cube,  18 
Dominated  convergence  theorem,  139 
Donsker-Prokhorov  invariance  principle,  561 
Doubly  stochastic  matrix,  410 

E 

Element 

random,  649 
Entropy,  448 

conditional,  45 1 
Equality 

Parseval,  161 
Equation 

backward  (forward)  Kolmogorov,  587,  605 
Chapman-Kolmogorov,  582 
renewal,  716 
Equivalent 

processes,  530 
sequences,  109 
Ergodic 

Markov  chain,  404 
sequence,  498 
state,  411 

transformation,  498 
Erlang  distribution,  177 
Essential  state,  392 


Subject  Index 
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Event,  2 
certain,  16 
impossible,  16 
random,  xiv 
renovating,  509 
tail,  316 
Events 

disjoint  (mutually  exclusive),  16 
independent,  22 
Excess,  280 
Existence 

of  expectation,  65 
of  integral,  643 
Expectation,  65 

conditional,  70,  92,  94,  95 
existence  of,  65 
Exponential 

Chebyshev  inequality,  248 
class  of  distributions,  373 
distribution,  38,  177 
polynomial,  355,  366 
Extinction  of  branching  process,  182 

F 

Factorisation,  334 
component,  334 
Fair  game,  72 

Finite-dimensional  distribution,  528 
First  nonnegative  sum,  336 
First  passage  time,  278 
Flow  of  a -algebras,  457 
Formula 
Bayes,  27 

total  probability,  25 
Function 

covariance,  611 
directly  integrable,  293 
distribution,  32 
properties,  33 
large  deviation  rate,  244 
locally  constant,  373 
lower,  546 
rate,  244 

regularly  varying,  266,  665 
renewal,  279 
sample,  528 

slowly  varying,  228,  665 
subexponential,  376 
test  (Fyapunov),  430 
transition,  582,  583 
upper,  546 


G 

Gamma  distribution,  176 


Gaussian 

distribution,  37 
process,  614 
Generating  function,  161 
Geometric  distribution,  38 
Gnedenko  local  limit  theorem,  221 

H 

Hahn’s  theorem  on  decomposition  of  measure, 
646 

Harris  (irreducible)  Markov  chain,  424 
Helly  theorem,  655 
Holder  inequality,  88 
Homogeneous 

Markov  chain,  391,  416 
Markov  process,  583 
process,  539 
renewal  process,  285 


I 

Identity 

Pollaczek-Spitzer,  345 
Wald,  469 
Immigration,  591 
Improper  random  variable,  32 
Independent 

classes  of  events,  51 
events,  22 

random  variables,  153 
trials,  24 

Indicator  of  event,  66 
Inequality 

Cauchy-Bunjakovsky,  87,  97 
Chebyshev,  89,  96 
Chebyshev  exponential,  248 
Holder,  88 
Jensen,  88,  97 
Kolmogorov,  478 
Minkowski,  88,  133 
Schwarz,  88 
Inessential  state,  392 
Infinitely  divisible  distribution,  539 
Information,  448 
amount  of,  448 
Integrability,  uniform,  135 
Integral,  630,  632,  642 

of  a  nonnegative  measurable  function,  632 
over  a  set,  63 1 
Integro-local  theorems,  216 
Invariance  principle,  567 
Invariant 

distribution,  419 
random  variable,  498 
set,  497 
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Subject  Index 


Irreducible  Markov  chain,  393 
Iterated  logarithm,  law  of,  545,  546,  568 

J 

Jensen  inequality,  88,  97 

K 

Karamata  theorem,  668 
Kolmogorov 

equation,  backward  (forward),  587,  605 
inequality,  478 

theorem  on  consistent  distributions,  56,  625 

L 

Laplace  transform,  156,  241 
Large  deviation 
probabilities,  126 
rate  function,  244 
Large  numbers,  law  of,  107,  188 
for  renewal  processes,  298 
strong,  108 

Lattice  distribution,  40 
Law 

of  iterated  logarithm,  545,  546,  568 
of  large  numbers,  90,  107,  188 
for  renewal  processes,  298 
strong,  108 

Lebesgue  theorem,  644 

Legendre  transform,  244 

Levy  distribution,  235 

Limit  theorems,  local  for  Bernoulli  scheme, 

113 

Linear  prediction,  617 
Local  limit  theorem,  219 
Locally  constant  function,  373 
Lower 

function,  546 
sequence,  318 

Lyapunov  condition,  202,  560 

M 

Markov 

chain,  390,  414,  585 
aperiodic,  419 
ergodic,  404 
Harris  (irreducible),  424 
homogeneous,  391,  416 
periodic,  397,  419 
reducible  (irreducible),  393 
process,  580 

homogeneous,  583 
property,  390 
strong,  418 
time,  75 


Martingale,  457,  459 
Matrix 

doubly  stochastic,  410 
stochastic,  391 
transition,  391 
Mean  value,  65 
Measurable  space,  14 
Measure,  629 
density  of,  642 
extension,  19,  622 
theorem,  19,  622 
outer,  619 
signed,  629 
singular,  644 
space,  629 

Measure  preserving  transformation,  494 
Measure  Space,  629 
Metric  transitive 
sequence,  498 
transformation,  498 
Minkowski  inequality,  88 
Mixed  moment,  87 
Mixing  transformation,  499 
Modification  of  process,  530 
Moment 
central,  87 
k- th  order,  87 
mixed,  87 

Multinomial  distribution,  47 
Multivariate  normal  (Gaussian)  distribution, 
48,  173 

N 

Negatively  correlated  random  variables,  87 
Non-lattice  distribution,  160 
Normal  distribution,  37 
Null  state,  394 

O 

Oscillating  random  walk,  435 
Outer  measure,  619 
Overshoot,  280 

P 

Parseval  equality,  161 
Passage  time,  336 
Path,  528 

Pathwise  shift  transformation,  496 
Periodic 

Markov  chain,  397,  419 
state  of  Markov  chain,  394 
Persistent  state,  394 
Poisson 

distribution,  26,  39 
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Poisson  ( cont .) 

process,  297,  549 
theorem,  121 

Pollaczek-Spitzer  identity,  345 
Polynomial 

Bernstein,  109 
exponential,  355,  366 
Positive  atom,  420 
Positive  state,  394,  411 
Positively  correlated  random  variables,  87 
Posterior  probability,  28 
Prediction,  616 
linear,  617 
Prior  probability,  28 
Probability,  16 

conditional,  22,  95 
distribution,  17 
posterior,  28 
prior,  28 
properties  of,  20 
space,  17 

sample,  528 
wide- sense,  17 
transition,  583 
Process 

branching,  180,  591 
Brownian  motion,  549 
compound  Poisson,  552 
continuous  in  mean,  536 
diffusion,  603 
distribution  of,  528,  529 
Gaussian,  614 
homogeneous,  539 
Markov,  580 
modification  of,  530 
Poisson,  297,  549 
random  (stochastic),  527,  529 
regenerative,  600 
regular,  532 
renewal,  278 

homogeneous  (stationary),  285 
semi-Markov,  593 
separable,  535 

stochastically  continuous,  536,  584 
strict  sense  stationary,  614 
unpredictable,  611 
Wiener,  542 
with  immigration,  591 
with  independent  increments,  539 
Prokhorov  theorem,  65 1 
Proper  random  variable,  73 
Property,  strong  Markov,  418 
Pseudomoment,  210 


Q 

Quantile,  43 
transform,  43 

R 

Radon-Nikodym  derivative,  642,  644 
Radon-Nikodym  theorem,  644 
Random 

element,  414,  649 
event,  xiv 
process,  527,  529 
sequence,  527 
variable,  3 1 

almost  invariant,  498 
complex- valued,  153 
defined  on  Markov  chain,  437 
distribution  of,  32 
improper,  32 

independent  of  the  future,  75 
invariant,  498 
proper,  73 
standardised,  85 
subexponential,  376,  675 
symmetric,  157 
tail,  317 
variables 

independent,  153 

positively  (negatively)  correlated,  87 
vector,  44 

walk,  277,  278,  335 
oscillating,  435 
skip-free,  384 
symmetric,  400,  401 
with  reflection,  434 
Range,  Cramer,  256 
Rate  function,  244 
Recurrent  state,  394 
Reflection,  391,  434 
Regeneration  time,  600 
Regenerative  process,  600 
Regression  line,  103 
Regular  process,  532 
Regularly  varying  function,  266,  665 
Renewal 

equation,  716 
function,  279 
integral  theorem,  280 
local  theorem,  294 
process,  278 
Renovating 
event,  509 

sequence  of  events,  509 
Right  closed  martingale  (semimartingale),  459 
Ring,  14 
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S 

Sample 

function,  528 
probability  space,  528 
space,  414,  649 
Schwarz  inequality,  88 
Semi-invariant,  242 
Semi-Markov  process,  593 
Semimartingale,  458 
Separable  process,  535 
Sequence 

asymptotically  normal,  1 87 

Cauchy  (in  probability,  a.s.,  in  the  mean), 

132 

ergodic,  498 

generated  by  transformation,  495 
lower,  318 

metric  transitive,  498 
renovating,  509 
stationary,  493 
stochastic,  457 
stochastic  recursive,  507 
tight,  148 

uniformly  integrable,  135 
upper,  318 

weakly  dependent,  499 
Series,  Cramer,  248 
Set 

almost  invariant,  497 
invariant,  497 
Shift  coefficient,  604 
a -algebra,  14 
Signed  measure,  629 
Singular 

distribution,  41,  325 
measure,  644 
Skip-free  walk,  384 
Slowly  varying  function,  228,  665 
Space 

measurable,  14 
measure,  629 

of  functions  without  discontinuities  of  the 
second  kind,  529 
probability,  17 
sample,  414,  649 
sample  probability,  528 
Spectral  measure,  556 
Stable  distribution,  233 
Standard  deviation,  83 
Standardised  random  variable,  85 
State 

absorbing,  393 
ergodic,  411 
essential,  392 


inessential,  392 
periodic,  394 
persistent,  394 
positive,  411 
recurrent,  394 
transient,  394 
State,  null,  394 
State,  positive,  394 
Stationary 

distribution,  404,  419 
of  waiting  time,  350 
process,  614 
sequence,  493 
of  events,  509 
Stochastic 
matrix,  391 
process,  527,  529 
recursive  sequence,  507 
sequence,  457 

Stochastically  continuous  process,  536,  584 
Stone-Shepp  integro-local  theorem,  216 
Stopping  time,  75,  462 
improper,  466 

Strong  law  of  large  numbers,  108 
Strong  Markov  property,  418 
Subexponential 

distribution,  376,  675 
function,  376 
random  variable,  376,  675 
Submartingale,  458,  459 
Sum,  first  nonnegative,  336 
Superexponential  class  of  distributions,  373 
Supermartingale,  458,  459 
Symmetric 

random  variable,  157 
random  walk,  401 

T 

Tail 

event,  316 
of  distribution,  228 
random  variable,  317 
Tauberian  theorem,  673 
Test  function,  430 
Theorem 

Abelian,  673 

Arzela-Ascoli,  657 

basic  coding,  455 

Berry-Esseen,  659 

Bochner-Khinchin,  158 

Caratheodory  (measure  extension),  19,  622 

central  limit,  1 87 

central  limit  for  renewal  processes,  299 
continuity,  134,  167,  173 
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de  Moivre-Laplace,  115,  124 
dominated  convergence,  139 
Gnedenko  local  limit,  221 
Hahn’s  on  decomposition  of  a  measure, 
Helly,  655 

integral  renewal,  280 
integro-local,  216 
Karamata,  668 

Kolmogorov,  on  consistent  distributions 
56,  625 
Lebesgue,  644 
local  limit,  219 
local  renewal,  294 
measure  extension,  19,  622 
Poisson,  121 
Prokhorov,  65 1 
Radon-Nikodym,  644 
Stone-Shepp  integro-local,  216 
Tauberian,  673 
two  series,  322 
Weierstrass,  109 

Tight  family  of  distributions,  651 

Tight  sequence,  148 

Time 

hrst  passage,  278 
Markov,  75 
passage,  336 
regeneration,  600 
stopping,  75 
waiting,  349 

Total  probability  formula,  25,  71,  98 
Total  variation,  652 
convergence  in,  653 
distance,  420 
Trajectory,  528 
Transform 
Cramer,  473 
Laplace,  156,  241 
Legendre,  244 
quantile,  43 


Transformation 

bidirectional  preserving  measure,  495 
ergodic,  498 
metric  transitive,  498 
mixing,  499 
pathwise  shift,  496 
preserving  measure,  494 
Transient  state,  394 
Transition 
density,  583 
function,  582,  583 
matrix,  391 
probability,  583 

Triangular  array  scheme,  121,  188 
Two  series  theorem,  322 

U 

Undershoot,  290 
Uniform  distribution,  18,  37,  325 
Uniform  integrability,  135 
right  (left),  139 
Unpredictable  process,  611 
Upper 

function,  546 
sequence,  318 

V 

Variable,  random,  3 1 
Variance,  83 
Vector,  random,  44 

W 

Waiting  time,  349 

stationary  distribution  of,  350 
Wald  identity,  469 
fundamental,  47 1 
Walk,  random,  277,  278,  335 
Weak  convergence,  141,  173,  649 
Weakly  dependent  sequence,  499 
Weierstrass  theorem,  109 
Wiener  process,  542 


